Hi Aspose,
Hi Tuyen,
try (InputStream in = new FileInputStream(“D:\tmp\Tuyen\aspose\pdf-sample+copy.pdf”)) {<o:p></o:p>
Document document = new Document(in);
TextFragmentAbsorber absorber = new TextFragmentAbsorber("Document Format (PDF)");
TextSearchOptions searchOption = new TextSearchOptions(false);
absorber.setTextSearchOptions(searchOption);
Page firstPage = document.getPages().get_Item(1);
firstPage.accept(absorber);
System.out.println("Num of found text: " + absorber.getTextFragments().size());
}
In case of any further assistance, please feel free to contact us.
Best Regards,
Hi Asad,
Hi Tuyen,
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco}
The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan
There are no spaces and any other invisible characters in url on the page.
Adobe Acrobat actually finds the text with spaces: '
[www.groupe-](http://www.groupe-/) t2i .com'
. But we could not find any reason for this in the document. Aspose.PDF finds this text as ‘www.groupe-t2i.com
’.
Please consider the following code with Aspose.PDF for Java 20.6:
InputStream in = new FileInputStream(dataDir + "OnePage.pdf");
Document document = new Document(in);
TextFragmentAbsorber absorber = new TextFragmentAbsorber("www.groupe-t2i.com");
TextSearchOptions searchOption = new TextSearchOptions(true); //false value also works correctly.
absorber.setTextSearchOptions(searchOption);
Page firstPage = document.getPages().get_Item(1);
firstPage.accept(absorber);
System.out.println("Num of found text: " + absorber.getTextFragments().size());