Avoid Exception upon HTML (containing JPEG Image with Corrupted EXIF Tag) to PDF File Conversion using C# .NET API

vanapandin · August 6, 2020, 9:46am

Every day system is generating around 1000 PDF from html for past 7 years. Suddenly system is started failing for particular data when we started checking the data , found that no issues with data. We removed the images from PDF , then pdf was generated successfully when we add the images ,throwing below exception

We found that which is not happening for all the images , particularly some type of images has issue.
we tested with latest version and older version and both has same behavior.

using (MemoryStream ms = new MemoryStream())
{
byte[] bytes = Encoding.UTF8.GetBytes(html);
ms.Write(bytes, 0, bytes.Length);
Document doc = new Document(ms);

                foreach (Section sec in doc)
                {
                    sec.PageSetup.LeftMargin = -10;
                    sec.PageSetup.RightMargin = 5;
                    sec.PageSetup.TopMargin = 15;
                    sec.PageSetup.PaperSize = PaperSize.A4;
                }
       
                doc.Save(outputPath);
            }<a class="attachment" href="/uploads/default/41112">Capture.JPG</a> (39.6 KB)

4937836_637309714703118047_st22.jpeg (90.2 KB)
4937836_637309714708430593_fr.jpeg (121.0 KB)

Exception : FileCorruptedException
Error Message: “The document appears to be corrupted and cannot be loaded.”
Inner Exception : Stream length must be non-negative and less than 2^31 - 1 - origin.\r\nParameter name: value"}

awais.hafeez · August 6, 2020, 11:15am

@vanapandin,

Please ZIP and attach the following resources here for testing:

Your simplified input HTML file you are getting this problem with
Please also create a standalone simple Console application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your scenario and provide you more information.

vanapandin · August 12, 2020, 3:10am

Hi @awais.hafeez,

Sorry for the delay. We have prepared the console app which will reproduce the same exception.
Kindly check the issue and let us know if any further information required on this.

Aspose_POC.zip (5.0 MB)

.

awais.hafeez · August 12, 2020, 10:19am

@vanapandin,

The following simple C# code of Aspose.Words version 20.8 causes the same exception upon HTML to PDF conversion:

Document doc = new Document(@"C:\Aspose_POC\Aspose_POC\welcome.html");
doc.Save(@"C:\Aspose_POC\Aspose_POC\\20.8.pdf");

Exception details:

Aspose.Words.FileCorruptedException
  HResult=0x80131500
  Message=The document appears to be corrupted and cannot be loaded.
  Source=Aspose.Words
Inner Exception 1:
ArgumentOutOfRangeException: Stream length must be non-negative and less than 2^31 - 1 - origin.
Parameter name: value

For the sake of correction, we have logged this problem in our issue tracking system with ID WORDSNET-20914. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

vanapandin · August 25, 2020, 7:28am

Hi @awais.hafeez , Looks like Analysis is completed and planned for fix . Is there any updates for me like when will get the fix or any work around etc.

awais.hafeez · August 25, 2020, 9:53am

@vanapandin,

Yes, we have good news for you i.e. WORDSNET-20914 has now been resolved. The fix of this issue will be included in the next 20.9 version of Aspose.Words. We will inform you via this thread as soon as the next version containing the fix of this issue will be released at the start of next month.

aspose.notifier · September 16, 2020, 7:03am

The issues you have found earlier (filed as WORDSNET-20914) have been fixed in this Aspose.Words for .NET 20.9 update and this Aspose.Words for Java 20.9 update.