We are updating an older tool we have from .NET Framework to .NET (.NET 8 in this case). At the same time, we are updating to the latest version of the Aspose.PDF reference. With the newer reference, it is throwing an exception when trying to extract text from some of the PDF pages. There were no issues extracting text using our existing tool coded against an older version of the package.
The exception is:
System.ArgumentNullException: Value cannot be null. (Parameter 'collection')
at System.Collections.Generic.List`1.AddRange(IEnumerable`1 collection)
at #=zm5lWF53b9CaGKQrgSaSBNjBjMhuumDV3AircRarR$Lx8pkCn177JYJk=.#=zkkqtmUY=(Page #=zoxu0zJc=)
at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=zTikALPKBYtKS(BaseOperatorCollection #=zp9aiJKQ=, Resources #=zBZJECh4=, Page #=zoxu0zJc=, Rectangle #=zEE8cWxPSz4AE)
at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=zTikALPKBYtKS(BaseOperatorCollection #=zp9aiJKQ=, Resources #=zBZJECh4=, Rectangle #=zEE8cWxPSz4AE)
at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ.#=z8EuYod0=(Boolean #=zvEkNy4S31USy)
at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ..ctor(Page #=zoxu0zJc=, TextSearchOptions #=zCVmokbMRVfgv, Boolean #=zxbF$h2l6Ovnr)
at #=zM8gldwPuW51LR$FlICwXkb$yHCyth2rLrrLvIU$LskEQ5vvAsCyhNK88OfaZ..ctor(Page #=zoxu0zJc=, TextSearchOptions #=zCVmokbMRVfgv)
at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)
at Bug.Program.Main(String[] args) in C:\Scratch\AsposeBug\Bug\Bug\Program.cs:line 27
In diagnosing this, we notice the exception is thrown starting with the 23.11 version of the package. Prior versions seem to successfully read all pages.
I’ve attached an small example project that has the issue including a PDF where 3 of the pages are getting the exception thrown by the TextFragmentAbsorber as we are looping through the pages and extracting text.
Bug.zip (112.2 KB)
Thanks