Pdf to word conversion format and time issue

Hi,

I am facing some issues in conversion from pdf to word

  1. It is taking too long for larger files. For example for a file of size 6.6 MB that I uploaded it took approximately 5 minutes. This is the code I used
    Document document = new Document(file.getInputStream());
    DocSaveOptions saveOption = new DocSaveOptions();
    document.save(convertedFileName, saveOption);
    document.close();
    This is the file I used Sample_File_6_6MB.pdf (6.3 MB)

  2. The converted docx file doesn’t show proper format in libre office. It shows proper in XPS

@MathiasT

Thank you for contacting support.

We have been able to notice several minutes time for the conversion and a ticket ith ID PDFJAVA-39003 has been logged in our issue management system for further investigations. About proper formatting, would you please share generated word document along with screenshots of problems so that we may investigate further.

Hi,

Following are the some screen shots from libreoffice viewer for the same file I have uploaded in the original issue:Screenshot from 2019-11-16 12-23-04.png (307.0 KB)
Screenshot from 2019-11-16 12-24-17.png (329.5 KB)
Screenshot from 2019-11-16 12-23-42.png (284.9 KB)
Screenshot from 2019-11-16 12-23-31.png (271.7 KB)
Screenshot from 2019-11-16 12-23-24.png (306.9 KB)
Screenshot from 2019-11-16 12-23-12.png (291.2 KB)

Following are the screen shots in WPS writer which has some what better quality but not up to the mark
Screenshot from 2019-11-16 12-33-39.png (404.9 KB)
Screenshot from 2019-11-16 12-33-31.png (361.1 KB)
Screenshot from 2019-11-16 12-33-22.png (384.5 KB)

@MathiasT

Thank you for sharing the screenshots.

Another ticket with ID PDFJAVA-39005 has been logged to investigate formatting differences and we will let you know once any update will be available in this regard.

@MathiasT

We have investigated the ticket and got 100-110 seconds conversion time with Java and 90 seconds conversion time with .NET API. The document contains 158 pages with graphics and images on almost every page, and we think this is an acceptable time for such a document.

Also, we can recommend to decrease image resolution in conversion options to speed-up conversion:

//default value is 300, decreasing value to 150 makes conversion faster on 10-20% 
saveOption.setImageResolutionX(150);
saveOption.setImageResolutionY(150); 

In case you still experience any issue, please share your complete environment details i.e. OS Name and Version, JDK Version, Application Type, etc. with us.

@MathiasT

The problem with formatting occurs after conversion in DOC format. We recommend using the following option to convert in DOCX format and then it shows the proper format in Libre office under Linux.

saveOption.setFormat(DocSaveOptions.DocFormat.DocX);

The issue with .doc (not .docx) is not Aspose.PDF bug, but LibreOffice issue for Linux edition with doc format. The converted document can be opened in other viewers that supports DOC format without any format issues.

We have tested the following viewers:
LibreOffice for Windows, LibreOffice for MacOS, Apache OpenOffice, Microsoft Office, File Viewer Plus, etc.

Also, it looks fine in online document viewers: