Hi. We tryng to migrate to new (21.11) aspose version and we faced with searching TextFragment issue. We are specifying a rectangle region to search for, however the TextFragmentAbsorber returns all the fragments contained in the page.
21.8 aspose PDF version result: 21.8 result.png (140.6 KB)
21.11 aspose PDF version result: 21.11 result.png (137.3 KB)
Source file: testExtraction.pdf (402.5 KB)
Java version: 11
Code snippet
@Test
void testTextFragmentSearch() {
var document = new Document("testExtraction.pdf");
var page = document.getPages().get_Item(3);
var searchValue = "AstraZeneca";
var rectangle = new Rectangle(209.604,210.668,270.12,223.868);
drawRectangleOnPage(page, rectangle, new SetRGBColorStroke(0, 0, 1), new SetLineWidth(3));
var absorber = new TextFragmentAbsorber();
absorber.setPhrase(Pattern.quote(searchValue));
absorber.setTextSearchOptions(new TextSearchOptions(rectangle, true));
page.accept(absorber);
var textFragments = absorber.getTextFragments();
for (var textFragment : textFragments){
drawRectangleOnPage(page, textFragment.getRectangle(), new SetRGBColorStroke(1, 0, 0), new SetLineWidth(1));
}
document.save("result.pdf");
}
private static void drawRectangleOnPage(Page page, Rectangle rectangle, SetRGBColorStroke colorStroke, SetLineWidth width) {
page.getContents().add(new GSave());
page.getContents().add(new ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(colorStroke);
page.getContents().add(width);
page.getContents().add(new Re(rectangle.getLLX(), rectangle.getLLY(), rectangle.getWidth(), rectangle.getHeight()));
page.getContents().add(new ClosePathStroke());
page.getContents().add(new GRestore());
}
Have you got any workaround for that issue?