After upgrading to 19.5.0 of ASPOSE, PDF documents are corrupted in Acrobat Reader

We upgraded from Aspose word 17.3.0 to 19.5.0 and now the same code (as before the upgrade) generates corrupted PDFs. I am able to open the PDF in Chorme and FF, however, not in Adobe Acrobat Reader. It says the file is corrupted.

The only thing that changed here was the version. We are generating these PDFs from HTML. I have attached both good and bad PDF files.

PDF Files.zip (337.8 KB)

We have rolled back to 17.3.0, however, we would like to roll forward to the latest version.

Please advise.

A post was split to a new topic: PDF无法在Adobe Acrobat Reader中打开

@shmeep

Please ZIP and attach your input HTML document here for testing. We will investigate the issue on our side and provide you more information.

Instead of saving to the PDF format, I saved to the HTML format so you can see what HTML I am using to generate document. It is a manual with a table of contents and several documents within.
combineddocuments.zip (11.0 KB)

@shmeep

We have converted the shared HTML to PDF using the latest version of Aspose.Words for .NET 19.5 and have not found the shared issue. Please check the attached output PDF. 19.5.pdf (137.8 KB)

Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing. Thanks for your cooperation.

I have tried to create a console app to reproduce this behavior, however, it is a very complex cloud and database application that can’t easily be boxed up. Just simply using ASPOSE to convert HTML does not seem to cause it as I think it might have something to do with the table of contents. Also my manager won’t let me include anything that might be IP. So what I have done is had ASPOSE 19.5.0 generate this .docx file that is also corrupt. The docx that 19.5.0 (and not 17.x) generates is is also corrupt just like the PDF file. When you open it in MS Word, MS Word does recover it, but it is corrupt just like the PDF file. Can you look at the attached docx file and see if you can figure out what ASPOSE is doing wrong? If not is there anything else I can send you instead of the console application to reproduce it? Maybe debug info of some kind?ProblemDocument.zip (274.5 KB)

@shmeep

Unfortunately, it is difficult to say what the problem is without reproducing it. Please share the simplified application to reproduce this issue at our end. Thanks for your cooperation.

OK! We have isolated the problem and figured out way to get a console app that causes it. It looks like something changed between 17.x and 19.x in the ASPOSE save method. We save the document to a byte array with a memorystream to upload to a Azure Cloud blob rather than directly the a file using a filestream.

When you save the document to a file using a filestream all works. When you save the document to a memorystream and then write the resulting byte array to disk or cloud blob it breaks. I have included a console app that will show you here. AsposeManualProblem.zip (3.4 KB). This same code works fine in 17.x, just not in 19.x

We must have the memorystream method working because that allows us to upload manuals to the cloud without having the write the manual to disk first.

Thanks for your help!

@shmeep

This issue is not related to Aspose.Words. If you open the PDF file in Notepad++, there are some extra NUL characters in PDF as shown below. If you remove them and open the PDF in Adobe reader, the PDF is opened without any issue.

startxref
16103
%%EOF
NULNULNUL…

We suggest you please use the following code snippet to get the desired output.

using (var ms = new MemoryStream())
{
    _combinedDocument.Save(ms, asposeFormat);

    FileStream file = new FileStream("c:\\temp\\manualBROKEN.pdf", FileMode.Create, FileAccess.Write);
    ms.WriteTo(file);
    file.Close();
    ms.Close();
}

I believe something has changed because the only thing we are changing in our code is the version of ASPOSE from 17.3.0 to 19.5.0. The NULLs are present in both versions, however there about several more in the 19.5.0 version that seems to be breaking it. If you rollback to 17.x and us
e the code snippet I sent below you will see. I also attached the two different files generated with the same code below only the version of ASPOSE is different. manualWORKS_17_3_0.zip (31.1 KB)

The code snippet you sent is what is the working case. However, we need the following code snippet to work because we do not write files to the file system, but rather extract the byte array from the memory stream.

SaveFormat asposeFormat = SaveFormat.Pdf;
using (var ms = new MemoryStream())
{
    _combinedDocument.Save(ms, asposeFormat);
    ms.Flush();

    //We need this so we can have a byte array to send to Azure Blob Storage. 
    var bytes = ms.GetBuffer();

    //We Persist this byte array to Azure Cloud storage not to a file. However, you can see the problem if you persist this byte array to the file.
    using (var fs = new FileStream("c:\\code\\manualWORKS.pdf", FileMode.Create))
    {
        fs.Write(bytes, 0, bytes.Length);
    }
}

Also this is all true for the docx format as well. The docx format is really broken. It doesn’t seem to be just the NULLs in this case. As with the PDF if you write to filestream it is fine, it is just broken when reading the byte array from the memory stream.

@shmeep

You can use MemoryStream.ToArray method as shown below to fix this issue.

var array = ms.ToArray();//ms.GetBuffer();
fs.Write(array, 0, array.Length);

Thank you, that works.

@shmeep

Thanks for your feedback. Please let us know if you have any more queries related to Aspose.Words.