Hi Aspose,
Hi Tuyen,
Thanks for contacting support.
The problem was in setting TextSearchOptions regular expression based as true. This way, API will consider given search string as regular expression, that is why you are getting text fragments count as zero in the output. You need to use TextSearchOptions(false), in order to tell the API to match string exact like the given one. Please check following code snippet and the highlighted part to achieve the functionality.
try (InputStream in = new FileInputStream(“D:\tmp\Tuyen\aspose\pdf - sample + copy.pdf”)) {
Document document = new Document(in);
TextFragmentAbsorber absorber = new TextFragmentAbsorber(“Document Format(PDF)”);
TextSearchOptions searchOption = new TextSearchOptions(false);
absorber.setTextSearchOptions(searchOption);
Page firstPage = document.getPages().get_Item(1);
firstPage.accept(absorber);
System.out .println("Num of found text: " + absorber.getTextFragments().size());
}
In case of any further assistance, please feel free to contact us.
Best Regards,
Hi Asad,
Hi Tuyen,
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco}
The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan
There are no spaces and any other invisible characters in url on the page.
Adobe Acrobat actually finds the text with spaces: '
[www.groupe-](http://www.groupe-/) t2i .com'
. But we could not find any reason for this in the document. Aspose.PDF finds this text as ‘www.groupe-t2i.com
’.
Please consider the following code with Aspose.PDF for Java 20.6:
InputStream in = new FileInputStream(dataDir + "OnePage.pdf");
Document document = new Document(in);
TextFragmentAbsorber absorber = new TextFragmentAbsorber("www.groupe-t2i.com");
TextSearchOptions searchOption = new TextSearchOptions(true); //false value also works correctly.
absorber.setTextSearchOptions(searchOption);
Page firstPage = document.getPages().get_Item(1);
firstPage.accept(absorber);
System.out.println("Num of found text: " + absorber.getTextFragments().size());