TextFragmentAbsorber ignore rectangle region set in TextSearchOptions

Hi. We tryng to migrate to new (21.11) aspose version and we faced with searching TextFragment issue. We are specifying a rectangle region to search for, however the TextFragmentAbsorber returns all the fragments contained in the page.

21.8 aspose PDF version result: 21.8 result.png (140.6 KB)
21.11 aspose PDF version result: 21.11 result.png (137.3 KB)
Source file: testExtraction.pdf (402.5 KB)

Java version: 11

Code snippet

 @Test
    void testTextFragmentSearch() {
        var document = new Document("testExtraction.pdf");
        var page = document.getPages().get_Item(3);
        var searchValue = "AstraZeneca";
        var rectangle = new Rectangle(209.604,210.668,270.12,223.868);
        drawRectangleOnPage(page, rectangle, new SetRGBColorStroke(0, 0, 1), new SetLineWidth(3));
        var absorber = new TextFragmentAbsorber();

        absorber.setPhrase(Pattern.quote(searchValue));
        absorber.setTextSearchOptions(new TextSearchOptions(rectangle, true));
        page.accept(absorber);

        var textFragments = absorber.getTextFragments();
        for (var textFragment : textFragments){
            drawRectangleOnPage(page, textFragment.getRectangle(), new SetRGBColorStroke(1, 0, 0), new SetLineWidth(1));
        }

        document.save("result.pdf");
    }

    private static void drawRectangleOnPage(Page page, Rectangle rectangle, SetRGBColorStroke colorStroke, SetLineWidth width) {
        page.getContents().add(new GSave());
        page.getContents().add(new ConcatenateMatrix(1, 0, 0, 1, 0, 0));
        page.getContents().add(colorStroke);
        page.getContents().add(width);
        page.getContents().add(new Re(rectangle.getLLX(), rectangle.getLLY(), rectangle.getWidth(), rectangle.getHeight()));
        page.getContents().add(new ClosePathStroke());
        page.getContents().add(new GRestore());
    }

Have you got any workaround for that issue?

@dkuksa

We were able to replicate the similar issue in our environment. Therefore, have logged it as PDFJAVA-41106 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.