Hi Aspose,
I tried the attached pdf document and can’t find this text: “to generate efficient machine code” which I can do in Adobe. The code is as below:
public static void main(String[] args) throws Exception {
try (InputStream in = new FileInputStream("D:\\tmp\\Dat\\sherl_clean.pdf")) {
Document document = new Document(in);
TextFragmentAbsorber absorber = new TextFragmentAbsorber("to generate efficient machine code");
boolean regularExpUsed = false;
TextSearchOptions searchOption = new TextSearchOptions(regularExpUsed);
absorber.setTextSearchOptions(searchOption);
Page firstPage = document.getPages().get_Item(1);
firstPage.accept(absorber);
System.out.println("Num of found text: " + absorber.getTextFragments().size());
if (absorber.getTextFragments().size() > 0) {
TextFragment frag0 = absorber.getTextFragments().get_Item(1);
System.out.println("Text in fragment: " + frag0.getText());
} else {
System.out.println("Can't find any text fragment");
}
}
}
It prints out “Can’t find any text fragment”. If I change the search text to “to generate .* machine code” with regularExpUsed as true, then the text prints out as: “to generate ef?cient machine code” which is not as “to generate efficient machine code” as I expect it to be.
Please let me know if this is a bug, I’m using Aspose PDF for Java 17.5. Thank you.sherl_clean.pdf (964.5 KB)
Regards,
Tuyen