IndexOutOfRange when I try to search text with TextFragmentAbsorber

techus · May 30, 2018, 7:25am

Simplified example from my application:

var text = “1”;
var pdfDoc = new Document(“cs_10.pdf”);
var textAbsorber = new TextFragmentAbsorber($"(?i){text}", new TextSearchOptions(true));
pdfDoc.Pages.Accept(textAbsorber);

Console.WriteLine(textAbsorber.TextFragments.Count);

Error:
Unhandled exception: System.ArgumentOutOfRangeException: Index is out of range. The index must be a positive number, and its size should not exceed the size of the collection.
Parameter name: startIndex
в System.Globalization.CompareInfo.IndexOf(String source, String value, Int32 startIndex, Int32 count, CompareOptions options)
в System.Globalization.CompareInfo.IndexOf(String source, String value, Int32 startIndex, CompareOptions options)
в Aspose.Pdf.Text.TextFragmentAbsorber.(♫??? , [] )
в Aspose.Pdf.Text.TextFragmentAbsorber.Visit(Page page)
в Aspose.Pdf.PageCollection.Accept(TextFragmentAbsorber visitor)
в TextLayerExtractor.Tests.Program.GetTextPosition() в D:\Projects\SmartInstruments\src\TextExtractorService\TextLayerExtractor.Tests\Program.cs:строка 161
в TextLayerExtractor.Tests.Program.Main() в D:\Projects\SmartInstruments\src\TextExtractorService\TextLayerExtractor.Tests\Program.cs:строка 84

Error is reproduced if I try to search any text with length = 1

Document was created before with Aspose PDF too:
cs_10.pdf (2.5 MB)

Aspose.PDF version 18.5.0.0, licensed

Farhan.Raza · May 30, 2018, 12:48pm

@techus

Thank you for contacting support.

We have worked with the data shared by you and would like to share with you that ArgumentOutOfRangeException can be avoided by adding escape sequence character to your regular expression, (?i)\{1\} but this does not match any string. Another regular expression \b\w{1}\b works fine for other PDF files but gives the count as zero with your PDF document. An investigation ticket with ID PDFNET-44782 has been logged in our issue management system. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

techus · May 30, 2018, 3:36pm

I would like to draw your attention to the fact that in version 17.6.0.0 we had no problems with such search. The problem only appeared after the upgrade to the latest version of 18.5.0

Farhan.Raza · May 30, 2018, 9:15pm

@techus

Thank you for the information.

We have tried to change the DLL to 17.6 version of Aspose.PDF for .NET API but the problem did not disappear. We will be able to trace the reason and fix this issue once it is investigated in our environment. We will let you know as soon as some significant progress is made in this regard.

Farhan.Raza · August 31, 2018, 7:11pm

A post was split to a new topic: ArgumentOutOfRangeException with TextFragmentAbsorber

aspose.notifier · July 14, 2021, 4:48pm

The issues you have found earlier (filed as PDFNET-44782) have been fixed in Aspose.PDF for .NET 21.7.