ArgumentOutOfRangeException in TextFragmentAbsorber

Hello, I am using the latest version of Aspose.PDF (19.4) but this issue has persisted since v16.11. For some reason certain PDF pages cause us to get this unhandled exception when calling either textFragmentAbsorber.Visit(page) or page.Accept(textFragmentAbsorber):

Index and length must refer to a location within the string.
Parameter name: length

This exception is thrown by System.String.Substring–here is the full stack trace from textFragmentAbsorber:

   at System.String.Substring(Int32 startIndex, Int32 length)
   at #=zyTB3XIyaBhej7MWu47CesmD$bfxg8HcchasVZf6ESKJX.#=zGt6m6yc97uXkcK1_Qw==(String #=zqJN9UNs=, #=zqF0Ar1gRucV_ #=ztMoGLK9J0q3b)
   at #=zyTB3XIyaBhej7MWu47CesmD$bfxg8HcchasVZf6ESKJX.#=zZ$33QmbInEBLvABe7g==(String #=zqJN9UNs=, #=zqF0Ar1gRucV_ #=ztMoGLK9J0q3b)
   at #=z41PQwlAXVF3C4xYRZatHFeTEH05ktVgHDtu8FONmp2teaBq5N9x8Ikc=.#=zSWCKZaoaiOR_(#=zOj0VlyKypTqDkT5xWZloVk7qovZYAlVrSw== #=zyCLMEEHDQfl1, Boolean #=zeUufYJyB1UihD2ZUxA==)
   at #=zxxOyLC020CcmMkhiy4tXZ1svD7kXoIcfVQ==.#=z6YWdw7I=(Boolean #=zjFq8Ob86k9k_ObVCOg==)
   at #=zxxOyLC020CcmMkhiy4tXZ1svD7kXoIcfVQ==.#=zMfeFBRpvOo5w()
   at #=zGuSEinf51GwaWgcLur1eBvogNseW53R1TDBzD1VUTV4dC$C$B0c_POE=.#=zuYcchDc=(#=zOj0VlyKypTqDkT5xWZloVk7qovZYAlVrSw== #=zoWfxCrV07EJ4)
   at #=zoZbZ78XusXM_nzRHEWgXZYTHHedWRiBO5OMTxn6RjiBhxC8LOw==.#=zsZXBrC4tFWq9()
   at #=zoZbZ78XusXM_nzRHEWgXZYTHHedWRiBO5OMTxn6RjiBhxC8LOw==..ctor(String #=zhTjiml0=, #=zfAqJJT0= #=z18f6JNs=, #=zfAqJJT0= #=z8$PsPVc0jD4V, #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI= #=zaRram4rt_ceE, #=zAJI72aOK1q$q6K3CAz0khNXmbz4GUqJdh0pZ574= #=zyt0hNPM=, #=zxO40XsL557cnUOLZuO0QYsrw1$jEogyRtroXecQjEgG8RtTxnPTGgak= #=z_nZRQU9BiHeC23L8Dg==, Double #=z3Z6nl1M=, Double #=z9onouws=, #=ztYWYwnALNhuNwST_2qf1gcU5JhqGT8NnHTOOirpzjinNof8Zfg== #=zE8gff7M=, #=zyv7qNJaFLOxmHUayweA$tUL5$dL5b0nKdH2vW0LTFwlxBEh0Icw_bPU= #=z2I0hUWU=)
   at #=zGuSEinf51GwaWgcLur1eBvogNseW53R1TDBzD1VUTV4dC$C$B0c_POE=.#=zVBRCJ4Pq$wtL(String #=zhTjiml0=, #=zfAqJJT0= #=z18f6JNs=, #=zfAqJJT0= #=z8$PsPVc0jD4V, #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI= #=zaRram4rt_ceE, #=zAJI72aOK1q$q6K3CAz0khNXmbz4GUqJdh0pZ574= #=zyt0hNPM=, #=zxO40XsL557cnUOLZuO0QYsrw1$jEogyRtroXecQjEgG8RtTxnPTGgak= #=z_nZRQU9BiHeC23L8Dg==, Double #=z3Z6nl1M=, Double #=z9onouws=, #=ztYWYwnALNhuNwST_2qf1gcU5JhqGT8NnHTOOirpzjinNof8Zfg== #=zE8gff7M=, #=zyv7qNJaFLOxmHUayweA$tUL5$dL5b0nKdH2vW0LTFwlxBEh0Icw_bPU= #=z2I0hUWU=)
   at #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI=.#=zWetla8P8NV4v(Int32 #=zCP5d0H4=, Int32 #=zLsgGattZzP6t, Operator #=zbqeNkAI=, #=ztYWYwnALNhuNwST_2qf1gcU5JhqGT8NnHTOOirpzjinNof8Zfg== #=zE8gff7M=)
   at #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI=.#=zL1psiRd2GpC5(#=zfAqJJT0= #=z18f6JNs=)
   at #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI=.#=z0DnGUR0=(Int32 #=zCP5d0H4=, Operator #=zbqeNkAI=)
   at #=zXu7BGZHPDNo6EkVq_lJdCo3wTL0hR1HTHIkLvRkDPLUl5Liu9tavhZI=.#=z8QtsWa0=()
   at #=zNmZ11JZG2s4gwNGgO6ST5Cc6wzPZL$7XQLF32TMcIIa$$C41HPn2Sab4IcZ1.#=zwFPLRFD42p5v(BaseOperatorCollection #=zBqFbvow=, Resources #=zyt0hNPM=, Page #=zraOqAJ0=)
   at #=zNmZ11JZG2s4gwNGgO6ST5Cc6wzPZL$7XQLF32TMcIIa$$C41HPn2Sab4IcZ1.#=zwFPLRFD42p5v(BaseOperatorCollection #=zBqFbvow=, Resources #=zyt0hNPM=)
   at #=zNmZ11JZG2s4gwNGgO6ST5Cc6wzPZL$7XQLF32TMcIIa$$C41HPn2Sab4IcZ1.#=zwaAhBC0=()
   at #=zNmZ11JZG2s4gwNGgO6ST5Cc6wzPZL$7XQLF32TMcIIa$$C41HPn2Sab4IcZ1..ctor(Page #=zraOqAJ0=, TextSearchOptions #=zr40I$rv9oWYL, Boolean #=zfY4x0s3Snhxe)
   at #=zNmZ11JZG2s4gwNGgO6ST5Cc6wzPZL$7XQLF32TMcIIa$$C41HPn2Sab4IcZ1..ctor(Page #=zraOqAJ0=, TextSearchOptions #=zr40I$rv9oWYL)
   at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)

I would be happy to share an example PDF that has this issue but it is confidential so I don’t want to share it publicly on the forum.

Thanks,
Jonny

@jonnyt

Thank you for contacting support.

Please note that forum attachments are accessible to thread owner and Aspose staff only so you may share the file here, or you may share the link with us by clicking on my username and then on message button. Kindly share SSCCE code and the PDF document so that we may try to reproduce it in our environment to help you out.

@Farhan.Raza

Thanks for the quick response. I have attached a sample PDF that causes us problems. In order to replicate the exception, this is all you should have to do:

using Aspose.Pdf;
using Aspose.Pdf.Text;
using System;

namespace AsposeException
{
    class Program
    {
        static void Main(string[] args)
        {
            // set license
            License license = new License();
            license.SetLicense("Aspose.pdf.lic");
            license.Embedded = true;

            // open document
            using (Document doc = new Document("sample.pdf"))
            {
                try
                {
                    TextFragmentAbsorber tfa = new TextFragmentAbsorber();
                    tfa.Visit(doc.Pages[1]);
                    //doc.Pages[1].Accept(tfa);
                }
                catch (Exception ex)
                {
                    Console.Error.WriteLine(ex.ToString());
                }
            }
        }
    }
}

sample.pdf (1.1 MB)

@jonnyt

We have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFNET-46267 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFNET-46267) have been fixed in Aspose.PDF for .NET 23.2.