Aspose.Pdf - Corrupting PDF Files

We have a PDF file that Aspose.Pdf is corrupting just by virtue of opening / saving the file.

This file (221.pdf (41.3 KB)) is fine before using Aspose.Pdf. If I do the following:

new License().setLicense(Main.class.getClassLoader().getResourceAsStream("com/muhnamespace/Aspose.Total.Java.lic"));

Document pdf = new Document("221.pdf");
pdf.save("aspose.pdf")

It will create this file: aspose_corrupted.pdf (41.6 KB)

What’s interesting is if we do this:

// do not set our Aspose.Total license key...so "evaluation" mode
Document pdf = new Document("221.pdf");
pdf.save("aspose.pdf")

It will create this file: aspose_evaluation.pdf (68.0 KB) which is completely fine.

Notes

  • no Exception is thrown
  • nothing indicates a problem, except for the fact that you cannot open the file.
  • we are using version 20.11
compile (
            group: 'com.aspose',
            name: 'aspose-pdf',
            version: '20.11',
            classifier: 'jdk16'
    )

Please advise.

@leviwilson

A ticket with ID PDFJAVA-40902 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

Not sure if it helps or not track the problem down, but opening the “corrupted” file with PDFBox says this:

14:22:51.772 [main] ERROR org.apache.pdfbox.cos.COSObject - Can't dereference COSObject{2, 0}
java.io.IOException: Object must be defined and must not be compressed object: 35933:0
        at org.apache.pdfbox.pdfparser.COSParser.getObjectOffset(COSParser.java:672)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:623)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectStreamObject(COSParser.java:769)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:636)
        at org.apache.pdfbox.pdfparser.COSParser.dereferenceCOSObject(COSParser.java:585)
        at org.apache.pdfbox.cos.COSObject.getObject(COSObject.java:115)
        at org.apache.pdfbox.cos.COSDictionary.getDictionaryObject(COSDictionary.java:181)
        at org.apache.pdfbox.cos.COSDictionary.getCOSDictionary(COSDictionary.java:541)
        at org.apache.pdfbox.pdfparser.COSParser.checkPages(COSParser.java:1948)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:140)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:180)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:154)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:344)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:317)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:277)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:230)
        at com.teamnorthwoods.serverless.Main.main(Main.java:18)
Exception in thread "main" java.io.IOException: Page tree root must be a dictionary
        at org.apache.pdfbox.pdfparser.COSParser.checkPages(COSParser.java:1950)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:140)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:180)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:154)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:344)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:317)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:277)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:230)
        at com.teamnorthwoods.serverless.Main.main(Main.java:18)

That doesn’t speak to the differences as to why the evaluation license does not have this problem, yet our Aspose.Total license does have the problem.

@leviwilson

Thank you for sharing the information. I have associated that in our issue tracking system and will share the feedback with you as soon as it will be fixed.

Thanks! One last thing…my concern is that there was no indication that anything was problematic, so we only knew about the corruption downstream.

Are we missing something? Why did pdf.optimizeResources() or pdf.save(file) not throw?

@leviwilson

We have noted your concern and will let you know as soon as it is investigated.