Issue with merging invalid pdf

At present we have a mission critical supplier who, unfortunately, is generating invalid pdfs that we have to merge. The main problem is that stream objects have no length field.

Whether or not a given program can view and or render these pdfs depends on how tolerant they are of this issue. As an example Acrobat can open them, whatever engine Chrome and IE use to render PDFs is fine. Firefox (pdf.js?) cannot render them, they show as blank pages and… this is the issue for us, Aspose.PDF gets a fairly generic (“list contains invalid objects”) error when I try to add their pages to a document.

We’re working with the supplier to see if they can fix this, but at best it will take a while. In the meantime I was wondering if their are any tricks or anything else I can do using Aspose.pdf to import these documents short of repairing them in another third party program beforehand.

I’ve attached an example.

Hi Ash,

Thanks for your inquiry. I have merged your shared PDF document with another sample PDF document using Aspose.Pdf for .NET 10.5.0. But unable to notice any exception or issue. Please download and try the latest version of Aspose.Pdf for .NET; it will resolve the issue. If there is any difference in my understanding and your requirement then please share some more details along with sample code, we will look into it and guide you accordingly.

Document pdfDocument = new Document(myDir + "Etestdoc.pdf");
Document doc = new Document(myDir + "testdo.pdf");
pdfDocument.Pages.Add(doc.Pages);
pdfDocument.Save(myDir + "Mergedout.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Hi Tilal,

Thanks for looking into this. We are on 10.4, but I updated to 10.5. We’re using vb.net 3.5

I did what you are suggesting here and it worked, however the other way around (merging the invalid document into the valid one), raises the error.

To replicate this replace pdfDocument.Pages.Add(doc.Pages); with doc.Pages.Add(pdfDocument.Pages);

However, the fact that the other way around works got me thinking and I discovered that if I loaded the file into a document, saved to a memorystream and then loaded that stream into a new document it “fixed” the file.

It’s not particularly efficient, but we’re only dealing with small numbers of small documents so the overheads are fairly small.

Dim license As Aspose.Pdf.License = New Aspose.Pdf.License
license.SetLicense(“C:\Program Files\CompanyOffice\Aspose.Pdf.lic”)

Dim bad As String = “C:\bad.pdf”
Dim good As String = “C:\good.pdf”

Dim BADdoc As New Document(bad)
Dim GOODdoc As New Document(good)

'this works
'
’ BADdoc.Pages.Add(GOODdoc.Pages)
'


'this raises an error
'
’ GOODdoc.Pages.Add(BADdoc.Pages)
'


'THE FIX

Dim FIXstream As New MemoryStream
BADdoc.Save(fixstream)
Dim FIXdoc As New Document(fixstream)
GOODdoc.Pages.Add(FIXdoc.Pages)


Hi Ash,


Thanks for your feedback and findings. We have noticed the reported issue and logged it as PDFNEWNET-38936 in our issue tracking system for further investigation and rectification. We will notify you as soon as we resolve the issue.

We are sorry for the inconvenience caused.

Best Regards,

The issues you have found earlier (filed as PDFNET-38936) have been fixed in Aspose.PDF for .NET 22.10.