Hi. When we trying to find text fragment using regexp - TextFragmentAbsorber not able to find any text fragment.
Java version: 11
Aspose PDF version: 22.2
Code snippet
@Test
void testTextFragmentSearch() throws IOException {
var inputStream = new ClassPathResource("pdf/testExtraction.pdf").getInputStream();
var document = new Document(inputStream);
var page = document.getPages().get_Item(1);
var rectangle = new Rectangle(72.024, 381.17000000953675, 181.5959996213913, 393.31399996757506);
var absorber = new TextFragmentAbsorber();
var searchValue = "Canada";
searchValue = Pattern.quote(searchValue);
absorber.setPhrase(searchValue);
absorber.setTextSearchOptions(new TextSearchOptions(rectangle, true));
page.accept(absorber);
var textFragments = absorber.getTextFragments();
for (var textFragment : textFragments) {
drawRectangleOnPage(page, textFragment.getRectangle(), new SetRGBColorStroke(1, 0, 0), new SetLineWidth(1));
}
document.save("result.pdf");
}
private static void drawRectangleOnPage(Page page, Rectangle rectangle, SetRGBColorStroke colorStroke, SetLineWidth width) {
page.getContents().add(new GSave());
page.getContents().add(new ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(colorStroke);
page.getContents().add(width);
page.getContents().add(new Re(rectangle.getLLX(), rectangle.getLLY(), rectangle.getWidth(), rectangle.getHeight()));
page.getContents().add(new ClosePathStroke());
page.getContents().add(new GRestore());
}
For Aspose PDF version: 21.8 this code works properly.
Source file:testExtraction.pdf (402.5 KB)
Result: result.pdf (410.0 KB)
Expected result: Absorber.png (42.2 KB)