Stack overflow when attempting to convert specific PDF to PDF/A-1a

This file cannot be converted to PDF/A-1a: FD62223.pdf (3.6 MB)

Code to reproduce the issue:

  using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("FD62223.pdf"))
  using (MemoryStream msLogStream = new MemoryStream())
  {
    pdfDocument.Convert(msLogStream, Aspose.Pdf.PdfFormat.PDF_A_1A, 
      Aspose.Pdf.ConvertErrorAction.Delete);
    pdfDocument.Save("FD62223.A1a.pdf");
  }
  Console.WriteLine("Done.");

The code will be stuck at the pdfDocument.Convert call. In an older version of Aspose.PDF, I got stack overflow after a minute or two. Newest version 22.8 also uses 25% CPU (i.e., 100% of 1 core) and never returns from the call. It might get stack overflow or out of memory exception eventually. In any case, the PDF is not successfully converted.

[edit] With latest Aspose.Pdf I also got stack overflow exception but after a much longer time (about 10 minutes): The process ends and console shows:
Process is terminated due to StackOverflowException.

@gertjaap

An issue as PDFNET-52422 has been logged in our issue tracking system to analyze this case further. We will look into it and let you know as soon as the ticket is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

We have some extra information that could help you in resolving the problem.

  1. We received a second PDF with the same issue from our customer. The customer in turn received this PDF from the same external source. Both PDFs have this in common:
    a. Both were created using the PDF producer: EXPSystems LLC (www.exp-systems.com)
    b. This font error is shown for both PDFs when the PDF is opened in Acrobat Reader:
    The font ‘Arial-BoldMT’ contains a bad /BBox.

2.When I use an unlicensed version of Aspose.Pdf to try to convert to PDF/UA-1 instead of to PDF/A-1a, the stack overflow does not occur but instead the software raises a System.IndexOutOfRangeException:

Full exception text in this file: System.IndexOutOfRangeException.docx (42.3 KB)
Internal method names vary by Aspose.PDF version, the above is for the current version 22.8. Anyway, it seems to be that something goes wrong in the Aspose.Text.FontCollection.

(when using a licensed copy, we get stack overflow for conversion to PDF/UA-1 as well).

@gertjaap

Thanks for sharing your findings. They have been recorded under the ticket as well. We will surely include the information in our investigation and let you know as soon as some progress is made towards resolution of the ticket. Please spare us some time.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFNET-52422) have been fixed in Aspose.PDF for .NET 24.6.

Unfortunately, the issue is not fixed. I still get a System.StackOverflowException with the file and example code I shared in August 2022.
image.png (34.8 KB)

I tested with licensed Aspose.PDF v24.6.0.0.

@gertjaap

The document provided by you has a badly broken structure (probably a bug in the software used to create the document). Different PDF objects that constitute a PDF document are identified by the combination of their id number and generation number, and while some objects in the document have non-zero generation numbers, all the references to them in the document refer to the 0 generation, in fact, to non-existant objects from the point of view of the PDF. And the combination of these objects is so unfortunate that our library in an attempt to restore a reference to one non-existant object, stumbles on another one, and falls into a forever loop of restoration attempts.

Specifically for such cases, beginning with the Aspose.PDF 24.6 release we’ve added an option to the Document.Repair() method that allows to look for these broken references and fix them by replacing a 0 generation number with the correct one from the object definition in the document. Please try to run the Repair() method on these document prior to attempting to convert them, and use the following repair settings:

using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("FD62223.pdf"))
  using (MemoryStream msLogStream = new MemoryStream())
  {
    // Repair broken object references in the document prior to attempt to convert it
    pdfDocument.Repair(new Document.RepairOptions { RestoreIndirectObjectGenerations = true });

    pdfDocument.Convert(msLogStream, Aspose.Pdf.PdfFormat.PDF_A_1A, 
      Aspose.Pdf.ConvertErrorAction.Delete);
    pdfDocument.Save("FD62223.A1a.pdf");
  }
  Console.WriteLine("Done.");

Hi Asad,

The Repair method works well, but unfortunately I run into another problem. The problem is blocking for using Aspose.PDF version 24.6 at all:

  • We can no longer use the .NET framework version, because you choose to only deliver the component for framework 4.8.1. To update all customer servers we host from .NET Framework 4.8 to 4.8.1 is a large and time-consuming effort. Also some of our customers run their own Windows servers, with OS version only supporting framework up to version 4.8 (only the latest Windows Server -version 2022- supports it, see here).
  • The .NET Standard version would be an alternative, but it doesn’t work well. There are several problems with .NET core component dependencies. This one is blocking for my conversion:
Exception thrown: 'System.PlatformNotSupportedException' in System.Drawing.Common.dll
Exception thrown: 'System.PlatformNotSupportedException' in Aspose.PDF.dll
System.PlatformNotSupportedException: System.Drawing is not supported on this platform.
   at System.Drawing.Drawing2D.Matrix..ctor(Single m11, Single m12, Single m21, Single m22, Single dx, Single dy)
   at #=zT9w49rKlgFa83yIBOp8WB3b8w6T89nycZg==.#=zZqiuyA8=(Single #=zpaoXjXI=, Single #=zkd08BC4=, Single #=zD0KpTHI=, Single #=zDkkDQ6Q=, Single #=z$gp9nRU=, Single #=zJe_rBOI=, #=zzO2BNbhloGuHYwqR4Dq3u715VuUK92Omuw== #=zJiFNW08=)
   at #=zgMPS7SmlpF1VKLLlcly_sng_b662vSDkdLGKm7OGG4wx.#=zakmoYnwJ1cTA(#=zgMPS7SmlpF1VKLLlcly_spNBDvvvF0CV7w== #=zruQ1L_I=, #=zrpvj$vXCyWEIUiAU5sIvk6UsnZolxAY0gPrzCg8= #=zZtQM4mpAsd0C, #=zv0Y$LHT30Pvu$QZUfbvU7EIkWoPEY0jHhg== #=zSKSGGzE=, Single #=zxYsfsCz7c65P, Single #=zi7FCDpof4jrW, Boolean #=z20k6WTX1FuR1E1W95g==, Int32 #=zJv04WuNBeIxBcH0gDDb80EY=, Boolean #=zVpCGJgtuyKS6, Double& #=ze8Z9xHg=, Double& #=zhI9_MgY=, #=zmvCGE$WQe8nBgJeTXJMxVjw5TDyqFAzxHg==& #=zRKUK6OI=)
   at #=zgMPS7SmlpF1VKLLlcly_sng_b662vSDkdLGKm7OGG4wx..ctor(#=zEARDeqdSZ3gKal1PEO$WKlzFW8Jm #=zH9VDBv4=, #=zgMPS7SmlpF1VKLLlcly_spNBDvvvF0CV7w== #=zruQ1L_I=, #=zrpvj$vXCyWEIUiAU5sIvk6UsnZolxAY0gPrzCg8= #=zZtQM4mpAsd0C)
   at #=zYcgjK436AkZ2pEgF673wAlYVISHFBxZqlGpLdVA=.#=zsXFNNbh3GUvy(#=zEARDeqdSZ3gKal1PEO$WKlzFW8Jm #=zH9VDBv4=, #=zgMPS7SmlpF1VKLLlcly_spNBDvvvF0CV7w== #=zruQ1L_I=, #=zrpvj$vXCyWEIUiAU5sIvk6UsnZolxAY0gPrzCg8= #=zZtQM4mpAsd0C)
   at #=z8E6QT_un3eNqbGHiON4HJ9gRNxbsicIPEo7hoQlfmDdL.#=zXPJA8SM=(#=zEARDeqdSZ3gKal1PEO$WKlzFW8Jm #=zH9VDBv4=, #=zgMPS7SmlpF1VKLLlcly_spNBDvvvF0CV7w== #=zruQ1L_I=, #=zrpvj$vXCyWEIUiAU5sIvk6UsnZolxAY0gPrzCg8= #=zm9W6zPs=, #=zgMPS7SmlpF1VKLLlcly_sng_b662vSDkdLGKm7OGG4wx& #=z4IKK7Y4=)
   at #=zyMNbQq$vz7nVDh6zAoIqp4rAhAy5.#=zcRjZBOI=(#=zgMPS7SmlpF1VKLLlcly_sng_b662vSDkdLGKm7OGG4wx& #=z4IKK7Y4=)
   at Aspose.Pdf.Devices.ImageDevice.#=zcRjZBOI=(Page #=zruQ1L_I=)
   at #=zHPGoxkiKnnFyLQgLg0Y6kk1JTagJercNC0mLJHm6r$L7YdOjNOXmny8=.#=zqwRwOslCB6Wy(Int32 #=zkPDt_FA=)
   at #=zHPGoxkiKnnFyLQgLg0Y6kk1JTagJercNC0mLJHm6r$L7YdOjNOXmny8=.#=zps6TzKbBJEIH(Rectangle #=zvyK7r8Y=, Nullable`1 #=zu9JoR3c=, Nullable`1 #=zkGCN3e8=)
   at #=zHPGoxkiKnnFyLQgLg0Y6kk1JTagJercNC0mLJHm6r$L7YdOjNOXmny8=.#=zSlnMB$THVJnuxLs2PRalTe8=()
   at #=zHPGoxkiKnnFyLQgLg0Y6kk1JTagJercNC0mLJHm6r$L7YdOjNOXmny8=.#=zCndqlaQ=()
   at #=zc20jbNI2c1pSKKi26XxxeSmRlTtitUo1VBSj4rJRf3wExz4KYJthhKY=.#=zAaiCFhuYQo3O()
   at #=z6HQ73iZOcVn4IPseLrDIRR$vX_pBsgPiWx9hUs5KcGQhwh_y2VGxeUg=.#=z13ECSi0=()
   at #=zc20jbNI2c1pSKKi26XxxeSmRlTtitUo1VBSj4rJRf3wExz4KYJthhKY=.#=z9Ny8urI=()
   at #=z6HQ73iZOcVn4IPseLrDIRR$vX_pBsgPiWx9hUs5KcGQhwh_y2VGxeUg=.#=zXPJA8SM=(XmlTextWriter #=zWpHGEbI=, PdfFormat #=zGxuyFG0=, Document #=zH9VDBv4=, Boolean #=zKCS8GjGudgz7, ConvertErrorAction #=zYOlPgrQ=)
   at Aspose.Pdf.Document.#=zHHW26sE=(XmlTextWriter #=zWpHGEbI=, PdfFormat #=zGxuyFG0=, Boolean #=zKCS8GjGudgz7, ConvertErrorAction #=zYOlPgrQ=)
   at Aspose.Pdf.Document.Convert(Stream outputLogStream, PdfFormat format, ConvertErrorAction action)

I think the quickest fix would be to reconsider your decision to only deliver .NET Framework 4.8.1 and create a build for .NET Framework 4.8 as well. Or even better, deliver a .NET Framework 4.6.2 version, same as is done for Aspose.Words.

For now, we will be unfortunately unable to use Aspose.NET versions > 24.3, as 24.3 is the last version for which a .NET Framework 4.0 build is available.

@gertjaap

Although the main package includes DLLs for .NET 4.8.1 and higher. However, you can download other DLLs from Aspose Downloads which are compatible with .NET Framework 4.0.

Please let us know if it resolves your issue here.

Hi Asad,

Yes, this resolved my issue (I also needed the .NET Framework 4.0 version of Aspose.Imaging 24.6, that I found here).

Thank you for your support.

@gertjaap

Its nice to know that. Please feel free to create a new topic in case you face any issues.