Issue extracting text from PDF with newer Aspose.PDF for .NET reference

We are updating an older tool we have from .NET Framework to .NET (.NET 8 in this case). At the same time, we are updating to the latest version of the Aspose.PDF reference. With the newer reference, it is throwing an exception when trying to extract text from some of the PDF pages. There were no issues extracting text using our existing tool coded against an older version of the package.

The exception is:

System.ArgumentNullException: Value cannot be null. (Parameter 'collection')
   at System.Collections.Generic.List`1.AddRange(IEnumerable`1 collection)
   at #=zm5lWF53b9CaGKQrgSaSBNjBjMhuumDV3AircRarR$Lx8pkCn177JYJk=.#=zkkqtmUY=(Page #=zoxu0zJc=)
   at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=zTikALPKBYtKS(BaseOperatorCollection #=zp9aiJKQ=, Resources #=zBZJECh4=, Page #=zoxu0zJc=, Rectangle #=zEE8cWxPSz4AE)
   at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=zTikALPKBYtKS(BaseOperatorCollection #=zp9aiJKQ=, Resources #=zBZJECh4=, Rectangle #=zEE8cWxPSz4AE)
   at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=z8EuYod0=(Boolean #=zvEkNy4S31USy)
   at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ..ctor(Page #=zoxu0zJc=, TextSearchOptions #=zCVmokbMRVfgv, Boolean #=zxbF$h2l6Ovnr)
   at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ..ctor(Page #=zoxu0zJc=, TextSearchOptions #=zCVmokbMRVfgv)
   at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)
   at Bug.Program.Main(String[] args) in C:\Scratch\AsposeBug\Bug\Bug\Program.cs:line 27

In diagnosing this, we notice the exception is thrown starting with the 23.11 version of the package. Prior versions seem to successfully read all pages.

I’ve attached an small example project that has the issue including a PDF where 3 of the pages are getting the exception thrown by the TextFragmentAbsorber as we are looping through the pages and extracting text.

Bug.zip (112.2 KB)

Thanks

@kevin.coon

We are checking it and will get back to you shortly.

@kevin.coon

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-60177

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.