Thanks for sharing the sample document and code snippet.
We have tested the scenario by using your document and code snippet with Aspose.PDF for .NET 18.5. We were unable to notice more than one extracted text fragment with the given rectangle value to TextFragmentAbsorber. Please check complete code snippet used for testing:
Document doc = new Document(dataDir + "35525707.pdf");
Aspose.Pdf.Rectangle rectangle = new Rectangle(65.179, 484.73199999999997, 313.179, 504.63199999999995);
// Create TextAbsorber object to extract text
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
if (rectangle != null)
absorber.TextSearchOptions.Rectangle = rectangle;
// Accept the absorber for first page
var textFragments = absorber.TextFragments;
Would you please try your scenario with latest version of the API and in case you still face similar issue, please share a sample console application, which is able to reproduce the error in any environment. We will again test the scenario and address it accordingly.
It is good to know that you managed to find the bug in your existing code and resolve it. Please keep using our API and in case you face any other issue, please feel free to create a new topic in our support forums. We will be happy to assist you accordingly.
Please note that maximum upload size allowed by forums is 3MB, which was why you were unable to attached your sample project.
The TextFragmentAbsorber extracts text fragments from the PDF document and text is being extracted in the form it was added inside PDF. It seems that the text (i.e. 4868901-Jh-hellebjerg /JUELSMINDE) was added using two different text fragments, therefore it is being retrieved in same manners.
Nevertheless, we have logged an investigated ticket as PDFNET-44723 in our issue tracking system. We will further investigate this behavior of the API and keep you informed with the status of ticket resolution. Please be patient and spare us little time.