Creating huge memory issues while converting PDF to PDF/A

Hi,


We are facing memory issues while converting PDF to PDF/A. For example, I tried converting "Aspose_EndUserAgreement.pdf" from PDF to PDF/A, which is 197KB but when it converted to PDF/A, the size became 1.92MB.

I am really surprised by looking at the size of PDF/A. Can you please let me know why it is creating such a big file? How to bring it back to the size of original PDF or near to original size.

I used "com.aspose.pdf.Document.OptimizationOptions" but it helped to reduce file size by 25% only i.e. 2MB file came down to 1.5 MB. But still its a big file for original file size of 200KB.

Nayyer is able to reproduce the issue and here is issue tracking id PDFNEWNET-34925

Any quick help is appreciated.

// Open document com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("input.pdf");
// Convert to PDF/A compliant document pdfDocument.validate("Validation_log.xml", com.aspose.pdf.PdfFormat.PDF_A_1B);
pdfDocument.convert("Conversion_log.xml", com.aspose.pdf.PdfFormat.PDF_A_1B, com.aspose.pdf.ConvertErrorAction.Delete);
// Save updated document pdfDocument.save("output.pdf");
// Open document com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("input.pdf");
// Convert to PDF/A compliant document pdfDocument.validate("Validation_log.xml", com.aspose.pdf.PdfFormat.PDF_A_1B);
pdfDocument.convert("Conversion_log.xml", com.aspose.pdf.PdfFormat.PDF_A_1B, com.aspose.pdf.ConvertErrorAction.Delete);
// Save updated document pdfDocument.save("output.pdf");

// Load source PDF file com.aspose.pdf.Document doc = new Document("source.pdf");
// Optimize the file size by removing unused objects com.aspose.pdf.Document.OptimizationOptions opt = new Document.OptimizationOptions();
opt.setRemoveUnusedObjects(true);
opt.setRemoveUnusedStreams(true);
opt.setLinkDuplcateStreams(true);
doc.optimizeResources(opt);
// Save the updated file doc.save("optimized.pdf");

Thanks

Hi Mayur,


Thanks for contacting support.

It appears that you already have reported above stated issue in another forum thread and I managed to reproduce the issue and its been already logged in our issue tracking system. Please note that when converting PDF file to PDF/A format, some extra metadata is embedded inside document for long term document prevention. However the team will further look into this matter from Aspose_EndUserAgreement.pdf perspective and we will keep you posted with our findings.

Now concerning to the issue earlier logged as PDFNEWNET-34925, please note that this problem has been resolved and we managed to figure out the reasons particular to those resource files.

Hi,

Can you please let me know how PDFNEWNET-34925 is resolved. After converting Aspose_EndUserAgreement.pdf to PDF/A, what the file size of PDF/A.

Also can you please let me know what are the extra metadata tags that will be embedded inside document while converting to PDF/A?

Thanks

SONJ:
Can you please let me know how PDFNEWNET-34925 is resolved. After converting Aspose_EndUserAgreement.pdf to PDF/A, what the file size of PDF/A.
Hi Mayur,

The above stated issue was logged against PDF documents shared by another customer. However when converting Aspose_EndUserAgreement.pdf to PDF/A format, the size of document is increased from 197 Kb to 1.45MB and we already have logged this issue as PDFNEWJAVA-34925 when using Aspose.Pdf for Java.

SONJ:
Also can you please let me know what are the extra metadata tags that will be embedded inside document while converting to PDF/A?
The development team will be further investigating the issue and will figure the reasons behind size increase problem. Once the investigation is completed, then we will be able to share any extra metadata tags being embedded inside document.

Hi Mayur,

Thanks for your patience.

We have further investigated earlier reported issue PDFNEWJAVA-34925 and as per our observations, its not a bug. The PDF/A places higher demands on the information it contains. In PDF/A all required fonts must be embedded within the PDF (as opposed to font linking in source document).
For more details please see the link below: http://www.pdfa.org/2013/02/pdfa-facts/