startIndex cannot be larger than length of string. (Parameter 'startIndex') on Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)

Greetings! Could you please help me.
From time to time I get this strange error while trying to get all text plecements in PDF Document.
startIndex cannot be larger than length of string. (Parameter ‘startIndex’) at System.String.Substring(Int32 startIndex, Int32 length)\n at #=zfzrBTbmJRnKe2fL4VWGyn_E4ysMXxoA1tfidGwalt4Ni.#=zn5VumAsHI0StQmHG0A==(String #=z_P$vmoE=, Int32 #=zEzaHbFjMXvP_, Int32 #=zMbCAnZwlSf1T, #=zSMUJFc06SAbh #=zwkJXlaYqkRXi)\n at #=zPoEb13OJ2G$LgqwnaXXzmqtsEGhwD7iAbcWbiLqNjPfN.#=z5fwqvUAXFzZq(#=zMViOo_fcO3ho7mWhsXrxVAOQ34ebgZJpQQ== #=zxFAjcvI=, Int32 #=zEzaHbFjMXvP_, Int32 #=zMbCAnZwlSf1T, Double #=zXsFi1gE=, Double& #=zawG6kig=, Double& #=zq$ZhZz0=, #=znkBLFSf92CE8QzrMCb5V_H5EP6ilRkOfWbvhabE=[] #=zgcmbxyaJLaf_, Boolean& #=zwZ6qbJ0=, Boolean #=zCZbHc4WWaiYH_LgnLOw2Xxc=)\n at #=zPoEb13OJ2G$LgqwnaXXzmqtsEGhwD7iAbcWbiLqNjPfN.#=zBIIQJ60=(#=zMViOo_fcO3ho7mWhsXrxVAOQ34ebgZJpQQ== #=zxFAjcvI=, Int32 #=zEzaHbFjMXvP_, Int32 #=zMbCAnZwlSf1T, Double #=zXsFi1gE=, Boolean #=zbOqRbWVAV0CFBDH3KA==, Double& #=zawG6kig=, Double& #=zq$ZhZz0=, #=znkBLFSf92CE8QzrMCb5V_H5EP6ilRkOfWbvhabE=[]& #=z1atDzcjsI2Gj, Boolean& #=zwZ6qbJ0=, Boolean #=zCZbHc4WWaiYH_LgnLOw2Xxc=)\n at #=zPoEb13OJ2G$LgqwnaXXzmqtsEGhwD7iAbcWbiLqNjPfN.#=zBIIQJ60=(#=zMViOo_fcO3ho7mWhsXrxVAOQ34ebgZJpQQ== #=zxFAjcvI=, Int32 #=zEzaHbFjMXvP_, Int32 #=zMbCAnZwlSf1T, Double #=zXsFi1gE=, #=znkBLFSf92CE8QzrMCb5V_H5EP6ilRkOfWbvhabE=[]& #=z1atDzcjsI2Gj)\n at #=zHSCBrEm4MU2lcr3JXjwt5ro$QYL496nMTUUYpoE=.#=zBIIQJ60=(#=zMViOo_fcO3ho7mWhsXrxVAOQ34ebgZJpQQ== #=zxFAjcvI=, Int32 #=zEzaHbFjMXvP_, Int32 #=zMbCAnZwlSf1T, Double #=zXsFi1gE=, #=znkBLFSf92CE8QzrMCb5V_H5EP6ilRkOfWbvhabE=[]& #=z1atDzcjsI2Gj)\n at #=zvQBtUMoE6VMVrLVvSWnPMZGle1hZgW$fOKdFB3ZODXlnp4C72kh4E$I=.#=zBIIQJ60=(#=zMViOo_fcO3ho7mWhsXrxVAOQ34ebgZJpQQ== #=zxFAjcvI=, Int32& #=zdERQdoQM8aLG, Int32 #=z6BZspVs=, Int32 #=zUARZ4ao=, Boolean #=zjatg8qZAMt$Gk7PhXE9BN1c=)\n at #=zkik0KlB_2NCRwF$5H7n5ro7b49DMVUI4HUGZdyM$EQsi0Q3tOA==.#=zUs5p6VKcJSIj(Int32 #=z6BZspVs=, Int32 #=zUARZ4ao=, Boolean #=zjatg8qZAMt$Gk7PhXE9BN1c=, Int32& #=zLP1Tyzs=, Int32& #=zOC$QgfnlEPc4HrsyoA==)\n at #=zkik0KlB_2NCRwF$5H7n5ro7b49DMVUI4HUGZdyM$EQsi0Q3tOA==.#=ze3wufmw2YDRX(Int32 #=z6BZspVs=, Int32 #=zUARZ4ao=, Boolean #=zjatg8qZAMt$Gk7PhXE9BN1c=)\n at #=zLS0bXYESTwt6s9O9ge8mOAadbYkmtp_6KWEqWw7FZI4KD8YIdVIGrPU=.#=z05HjLJ6yBFle(List1 #=zisixCVU6z7A$, Rectangle #=zcWpxBts=, Boolean #=zzYFGQv$7IL3a)\n at #=zLS0bXYESTwt6s9O9ge8mOAadbYkmtp_6KWEqWw7FZI4KD8YIdVIGrPU=…ctor(List1 #=zisixCVU6z7A$, Rectangle #=zcWpxBts=, Boolean #=zzYFGQv$7IL3a)\n at #=zF$N2EwAoVvcOZLaxVt6rxptkACiK5PYjLdqHJVFYVtAC0ZbrUcPKZAIeSDdF.#=zD83pJao=(String #=zGbyvYzJzCovU, Boolean #=zBbT1FDlanRMHb$rdVg==, Regex #=zewIhQMU=)\n at Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)\n at Aspose.Pdf.Page.Accept(TextFragmentAbsorber visitor)

I’m using licensed Aspose.PDF 22.10.0 on Windows

@EvgeniyMikhailov
Could you please share input document and code that replicates your issue?

@ilyazhuykov

Code is simple:

                using (Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(temporaryFileIn))
                {
                    var maxPageNum = pdfDocument.Pages.Count;
                    if (maxPageNum > 25) { maxPageNum = 25; }
                    for (int pagenumber = 1; pagenumber <= maxPageNum; pagenumber++)
                    {
                        Page page = pdfDocument.Pages[pagenumber];

                        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
                        textFragmentAbsorber.TextSearchOptions.LimitToPageBounds = true;
                        page.Accept(textFragmentAbsorber);    //Exception happens here
                        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
                        
                        // Loop through the fragments
                        foreach (TextFragment fragment in textFragmentCollection)
                        {
                           //.. some processing here..//
                        }
                    }
		 }

As for file, unfortunately I can’t share it with you, as we got this error on production site, where all file are sensitive client information

@EvgeniyMikhailov
In that case I suggest you to try replicate issue on current version which is 24.2
It seems that in 24.2 your code works fine with various pdf unless problem is document specific

1 Like