Size of Document grows exponetionaly

We have PDF for Java (version 11.7) and Words for Java (version 16.6)

We use the following code to create docx files from a pdf.

Document doc = new Document(“c:\2016 Director and Officer Questionnaire.pdf”);
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);

doc.save(“c:\2016 Director and Officer Questionnaire .docx”, saveOptions);

The resulting file is about 300k.

When we use Words api to do a compare and save the document the document size explodes to over 5 meg.

Compare code.

Document docA = new Document(“c:\2016 Director and Officer Questionnaire.docx”);
docA.compare(docB, “user”, new Date());
docA.save(“c:\diffeddoc.docx”);


Hi Paul,

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input documents.
  • Please attach the output Word file that shows the undesired behavior.
  • Please create a simple Java application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we'll start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip them and Click 'Reply' button that will bring you to the 'reply page' and there at the bottom you can include any attachments with that post by clicking the 'Add/Update' button.

I"m uploading all of the documents.

Documents (A and B) docx files were created from attached PDF files using Aspose PDF.

The resulting document is (comparedoc.dox) 428 times larger than either one of the original files.

This is the code used to convert from PDF to DOCX.

String path = “c:\pdffonts”;
java.util.List list = com.aspose.pdf.Document.getLocalFontPaths();

list.add(path);

com.aspose.pdf.Document.setLocalFontPaths(list);

System.out.println(list);

Locale.setDefault(new Locale(“en”, “US”));
Document doc = new Document(“c:\DocumentA.pdf”);
Document doc2 = new Document(“c:\DocumentB.pdf”);

// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);

doc.save(“c:\DocumentA.docx”, saveOptions);
doc2.save(“c:\DocumentB.docx”, saveOptions);

doc.close();
doc2.close();



This is the code to create the compared document.
Also please note that even though the two source documents use the EXACT same fonts, the compare process is changing the font and marking it as a revision which makes the resulting document unusable.

Locale.setDefault(new Locale(“en”, “US”));
Document docA = new Document(“c:\DocumentA.docx”);
Document docB = new Document(“c:\DocumentB.docx”);

docA.compare(docB, “Administrator”, new Date());

System.out.println(“Documents compared”);

if (docA.getRevisions().getCount() == 0) {
System.out.println(“Documents are equal”);
} else {
System.out.println(“Documents are NOT equal”);
System.out.println("Revisions : " + docA.getRevisions().getCount());
}
docA.save(“c:\comparedoc.docx”);



Hi Paul,

Thanks for sharing the detail. Please note that Aspose.Words mimics the same behavior as MS Word does. If you compare "DocumentA.docx" and "DocumentB.docx" using MS Word, you will get the output with approximately same size.

The issue seems to be related to Aspose.Pdf APIs. I am moving this thread to Aspose.Pdf forum. My colleagues from Aspose.Pdf team will investigate this issue and reply you soon.

Hi Paul,

Thanks for sharing the source documents and code. I have tested your scenario with shared document using Aspose.Pdf for Java 11.7.0 and managed to observe the reported file size growth issue. For further investigation, I have logged an issue in our issue tracking system as PDFJAVA-36058 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

Is there a time frame for this this fix. Our license are coming up for
renewl and we are not going to pay for software that is not working and
unusable to provide documents to our customers.

Please let me know at least the time frame as soon as possible.

Hi Paul,


Thanks for your feedback. As the issue is recently logged and it is still pending for investigation. As soon as investigation of the issue is completed then we will be in good position to share any ETA/workaround with you. We have recorded your concern and we will keep you updated about the issue resolution progress.

We are sorry for the inconvenience caused.

Best Regards,

I can appreciate not being able to give a time frame.

Unfortunately no one in management is going to approve the renewal of software that is not working, and we are at a dead stop until these issues are addressed/fixed.

Hi Paul,


Thanks for your patience.

As a normal rule of practice, the issues are resolved in first come first serve basis as we believe its the fairest policy to all the customers. Furthermore, your recent concerns are also recorded with product team and the will surely consider them during the resolution of this problem.

As soon as we have some further updates, we will let you know.

I convinced management to renew our license.

Is there an update on the time line for this being fixed?

Hi Paul,


Thanks for your inquiry. Our product team has completed initial investigation and found some font related issues. Now, they will plan its fix as per schedule. So, I am afraid we can not share any ETA at the moment. However we have recorded your concern and will notify you as soon as we made some significant progress towards issue resolution.

Thanks for your patience and cooperation.

Best Regards,

It has been OVER a month and there has been NO updates on this issue. Is this going to be resolved?

Hi Paul,


Thanks for your inquriy. Our product team has completed initial investigation, fixed some related issues and now working over the fix of original issue. We will inform you as soon some further update is available.

Thanks for your patience and cooperation.

Best Regards,

Hi Paul,


Thanks for your patience. We have good news for you, your above reported issue (PDFJAVA-36058) has been resolved and its fix will be included in upcoming release of Aspose.Pdf for Java i.e. 16.11.0. We we will notify you as soon as it is published and gets available for download.

Best Regards,

The issues you have found earlier (filed as PDFJAVA-36058) have been fixed in Aspose.Pdf for Java 16.11.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.