java.lang.OutOfMemoryError when converting a pdf to pdf/a

agiannousaki · March 4, 2021, 11:51am

Hello,
when converting a pdf document to PDF/A I am getting java.lang.OutOfMemoryError.
I have read in the release notes of Aspose.PDF for Java V20.11 that ‘java.lang.OutOfMemoryError: Java heap space when converting PDF to PDF/A.’ is fixed
https://www.componentsource.com/product/aspose-pdf-java/releases/2608886
so I have tested with aspose.pdf-21.2, but still getting the same error. Tried both in windows 10 and linux SUSE Linux Enterprise Server 10 (x86_64).
I have attached the source code in java and the pdf document that causes this problem.
AsposeUpdatePDFA.zip (2.1 MB)

asad.ali · March 4, 2021, 7:25pm

@agiannousaki

We tested the scenario in our environment and were able to notice the issue. The program got hung up and GUI became unresponsive. Would you please share the complete exception with stack trace information with us. We will log an issue in our issue tracking system and will share the ID with you.

agiannousaki · March 5, 2021, 8:39am

Thanks for your quick reply! Here is the exception
Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.aspose.pdf.Operator.lI(Unknown Source)
at com.aspose.pdf.OperatorCollection.lb(Unknown Source)
at com.aspose.pdf.OperatorCollection.ld(Unknown Source)
at com.aspose.pdf.OperatorCollection.size(Unknown Source)
at com.aspose.pdf.internal.l8k.l0u.lI(Unknown Source)
at com.aspose.pdf.internal.l8k.l0u.lj(Unknown Source)
at com.aspose.pdf.internal.l8k.l0if.lv(Unknown Source)
at com.aspose.pdf.internal.l8k.ly.lf(Unknown Source)
at com.aspose.pdf.internal.l8k.l0if.l0t(Unknown Source)
at com.aspose.pdf.internal.l8k.ly.lI(Unknown Source)
at com.aspose.pdf.ADocument.lI(Unknown Source)
at com.aspose.pdf.ADocument.convert(Unknown Source)
at com.aspose.pdf.Document.convert(Unknown Source)
at com.aspose.pdf.ADocument.convert(Unknown Source)
at com.aspose.pdf.Document.convert(Unknown Source)
at asposeUpdatePDFA.AsposeUpdatePDFA.update(AsposeUpdatePDFA.java:40)
at asposeUpdatePDFA.AsposeUpdatePDFA.main(AsposeUpdatePDFA.java:14)

asad.ali · March 5, 2021, 8:09pm

@agiannousaki

We have logged an issue as PDFJAVA-40248 in our issue tracking system with all the provided details. We will further investigate the reasons behind it and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

asad.ali · March 26, 2021, 12:12am

@agiannousaki

We have investigated the earlier logged issue and found that the document has a lot of vector graphics, that is why at least 2GB heap memory is required.

Please use the following VM option: -Xmx2g

Also, we found mistake in your code, the following line should not use filenameAfter as path to log file. It is better to use nameOfLogFile :

boolean validPDFA = doc.validate(filenameAfter, PdfFormat.PDF_A_1B);

agiannousaki · March 26, 2021, 7:33am

Thanks for your reply!
I don’t think there is a mistake in my code, I want to validate the converted file, which is saved with the name filenameAfter. Why should I validate the generated log file?

asad.ali · March 26, 2021, 7:49pm

@agiannousaki

The sample code for validation is as following:

Document doc = new Document(dataDir + "inputtovalidate.pdf");
bool validate = doc.validate(dataDir + "validationlog.xml", PdfFormat.PDF_A_2A);

The document which needs to be validated is already initialized in Document object. Document.Validate() method accepts first argument as log file path and second as the specific PDF Format for which validation needs to be done.