Unable to identify the text phrase using TextFragmentAbsorber


#1

Team, We are trying to find these tokens (text fragments) in the document using TextFragmentAbsorber :

Blockquote
patrickwphillips1@gmail.com, 60 Kerferd Street, PATRICK PHILLIPS, Patrick Phillips, Kerferd Street, 0422 088 900, East Malvern, 3145, VIC

Code used :
indent preformatted text by 4 spaces
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(token);//NO I18N
TextEditOptions textEditOptions= textFragmentAbsorber.getTextEditOptions();
textEditOptions.setFontReplaceBehavior(TextEditOptions.FontReplace.Default);
TextSearchOptions textSearchOptions = textFragmentAbsorber.getTextSearchOptions();
textSearchOptions.setRegularExpressionUsed(true);
p.accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

I have also attached the document with issue
resume.zip (35.8 KB)


#2

@divya13

Thank you for contacting support.

We have used below code snippet and the API finds the text just fine. Would you please ensure using Aspose.PDF for Java 19.6 and then share your kind feedback with us.

//String token = "patrickwphillips1@gmail.com";
//String token = "60 Kerferd Street";
//String token = "PATRICK PHILLIPS";
//String token = "Patrick Phillips";//DOES NOT EXIST IN PDF
String token = "Kerferd Street";
Document document = new Document(dataDir  + "19251_Resume-Patrick-Phillips.pdf");
Page p = document.getPages().get_Item(1);
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(token);//NO I18N
TextEditOptions textEditOptions= textFragmentAbsorber.getTextEditOptions();
textEditOptions.setFontReplaceBehavior(TextEditOptions.FontReplace.Default);
TextSearchOptions textSearchOptions = textFragmentAbsorber.getTextSearchOptions();
textSearchOptions.setRegularExpressionUsed(true);	
p.accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
for(TextFragment fragment: textFragmentCollection)
{
    System.out.print("Here: " + fragment.getText() + "\n");
}