Aspose.Pdf - Corrupting PDF Files

leviwilson · September 29, 2021, 4:41pm

We have a PDF file that Aspose.Pdf is corrupting just by virtue of opening / saving the file.

This file (221.pdf (41.3 KB)) is fine before using Aspose.Pdf. If I do the following:

new License().setLicense(Main.class.getClassLoader().getResourceAsStream("com/muhnamespace/Aspose.Total.Java.lic"));

Document pdf = new Document("221.pdf");
pdf.save("aspose.pdf")

It will create this file: aspose_corrupted.pdf (41.6 KB)

What’s interesting is if we do this:

// do not set our Aspose.Total license key...so "evaluation" mode
Document pdf = new Document("221.pdf");
pdf.save("aspose.pdf")

It will create this file: aspose_evaluation.pdf (68.0 KB) which is completely fine.

Notes

no Exception is thrown
nothing indicates a problem, except for the fact that you cannot open the file.
we are using version 20.11

compile (
            group: 'com.aspose',
            name: 'aspose-pdf',
            version: '20.11',
            classifier: 'jdk16'
    )

Please advise.

mudassir.fayyaz · September 29, 2021, 7:54pm

@leviwilson

A ticket with ID PDFJAVA-40902 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

leviwilson · September 29, 2021, 8:24pm

Not sure if it helps or not track the problem down, but opening the “corrupted” file with PDFBox says this:

14:22:51.772 [main] ERROR org.apache.pdfbox.cos.COSObject - Can't dereference COSObject{2, 0}
java.io.IOException: Object must be defined and must not be compressed object: 35933:0
        at org.apache.pdfbox.pdfparser.COSParser.getObjectOffset(COSParser.java:672)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:623)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectStreamObject(COSParser.java:769)
        at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:636)
        at org.apache.pdfbox.pdfparser.COSParser.dereferenceCOSObject(COSParser.java:585)
        at org.apache.pdfbox.cos.COSObject.getObject(COSObject.java:115)
        at org.apache.pdfbox.cos.COSDictionary.getDictionaryObject(COSDictionary.java:181)
        at org.apache.pdfbox.cos.COSDictionary.getCOSDictionary(COSDictionary.java:541)
        at org.apache.pdfbox.pdfparser.COSParser.checkPages(COSParser.java:1948)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:140)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:180)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:154)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:344)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:317)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:277)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:230)
        at com.teamnorthwoods.serverless.Main.main(Main.java:18)
Exception in thread "main" java.io.IOException: Page tree root must be a dictionary
        at org.apache.pdfbox.pdfparser.COSParser.checkPages(COSParser.java:1950)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:140)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:180)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:154)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:344)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:317)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:277)
        at org.apache.pdfbox.Loader.loadPDF(Loader.java:230)
        at com.teamnorthwoods.serverless.Main.main(Main.java:18)

That doesn’t speak to the differences as to why the evaluation license does not have this problem, yet our Aspose.Total license does have the problem.

mudassir.fayyaz · September 30, 2021, 12:50pm

@leviwilson

Thank you for sharing the information. I have associated that in our issue tracking system and will share the feedback with you as soon as it will be fixed.

leviwilson · September 30, 2021, 8:09pm

Thanks! One last thing…my concern is that there was no indication that anything was problematic, so we only knew about the corruption downstream.

Are we missing something? Why did pdf.optimizeResources() or pdf.save(file) not throw?

mudassir.fayyaz · October 1, 2021, 2:19am

@leviwilson

We have noted your concern and will let you know as soon as it is investigated.