We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Text is not extracted on using TextAbsorber when the token has superscript in the pdf

Text is not extracted on using TextAbsorber when the token has superscript in the pdf

Code Used :

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)"+token);//NO I18N
TextEditOptions textEditOptions= textFragmentAbsorber.getTextEditOptions();
textEditOptions.setFontReplaceBehavior(TextEditOptions.FontReplace.Default);
TextSearchOptions textSearchOptions = textFragmentAbsorber.getTextSearchOptions();
textSearchOptions.setRegularExpressionUsed(true);		
// Accept the absorber for first page of document
p.accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

Token passed to the code : 2nd Street
The textFragmentCollection did not return any TextFragments

Meena.pdf (103.5 KB)

@divya13

Thank you for contacting support.

We have worked with the data shared by you but have not been able reproduce the issue in our environment. Below is little modified code snippet along with generated file for your kind reference. 3rd.pdf

String token = "2nd";
Document document = new Document(dataDir  + "Meena.pdf");
Page page = document.getPages().get_Item(1);
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)"+token);//NO I18N
TextEditOptions textEditOptions= textFragmentAbsorber.getTextEditOptions();
textEditOptions.setFontReplaceBehavior(TextEditOptions.FontReplace.Default);
TextSearchOptions textSearchOptions = textFragmentAbsorber.getTextSearchOptions();
textSearchOptions.setRegularExpressionUsed(true);		
// Accept the absorber for first page of document
page.accept(textFragmentAbsorber);
// Get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();

for(TextFragment fragment: textFragmentCollection)
{
    System.out.print("Here: " + fragment.getText() + "\n");
    fragment.setText("3rd");
}
document.save(dataDir + "3rd.pdf");
}

Please upgrade to Aspose.PDF for Java 19.1 in your environment and then share your kind feedback with us.

@Farhan.Raza
Thank you for your response. I have upgraded to aspose.pdf 19.1 version and tried but
When I passed the token as “2nd” the textFragment is being identified but I get the token as “2nd Street” and need to identify “2nd Street” as a whole. Can you please check?

@divya13

Would you please elaborate while sharing the code snippet and screenshots so that we may assist you accordingly.