Extract text from PDF in C# - TextFragmentAbsorber does not find Textfragments

Hello,

We have an issue with a specific PDF-File. Aspose.Pdf does not find any Textfragments on the attached PDF. The same function works on other files. We could not identify any problems with the PDF itself. I’ve added a working and a not working file so you can try to reproduce the issue.

Here is the Code Snippet we use:

        using (Aspose.Pdf.Document doc = new Aspose.Pdf.Document("not_working.pdf"))
        {
            PdfFileInfo fileInfo = new PdfFileInfo(doc);
            int pageNumber = 1;
            var page = doc.Pages[pageNumber];
            

            float searchRectangleLLX = 0;
            float searchRectangleLLY = 0;
            var searchRectangleURX = fileInfo.GetPageWidth(pageNumber) - 1;
            var searchRectangleURY = fileInfo.GetPageHeight(pageNumber) - 1;

            var searchRectangle = new Aspose.Pdf.Rectangle(searchRectangleLLX, searchRectangleLLY, searchRectangleURX, searchRectangleURY);

            TextFragmentAbsorber textAbsorber = new TextFragmentAbsorber();
            textAbsorber.TextSearchOptions.LimitToPageBounds = true;
            textAbsorber.TextSearchOptions.Rectangle = searchRectangle;

            page.Accept(textAbsorber);

            TextFragmentCollection textFragments = textAbsorber.TextFragments;
        }

Some additional Information:

  • We use Aspose.Pdf Version 20.9.0.0

not_working.pdf (1.9 MB)
working.pdf (245.2 KB)

@docuguide

We were able to replicate the issue at our side using Aspose.PDF for .NET 20.12 and have logged it as PDFNET-49179 in our issue tracking system for the sake of correction. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

We have found a fix that is working right now because we search TextFragments on the whole page anyway. If we just remove the TextSearchOptions.Rectangle from the code we find the TextFragments.

@docuguide

It is nice to hear that you were able to sort out the issue at your side. Please keep using our API and let us know in case you need further assistance.