Text highlight in pdf

Hi Aspose support!
I need to highlight a multilines phrase if PDF file and I used this code:

    String text = "and interactive exercises";
    Document doc = new Document(file.getInputStream());
    Pattern re = Pattern.compile("(?i)" + text.replace(" ", "\\s*") + "\\b", Pattern.MULTILINE);
    TextFragmentAbsorber tfa = new TextFragmentAbsorber(re,
            new TextSearchOptions(true));
    doc.getPages().get_Item(1).accept(tfa);
    for(TextFragment tf : tfa.getTextFragments()){
        HighlightAnnotation highlightAnnotation = new HighlightAnnotation(tf.getPage(), tf.getRectangle());
        highlightAnnotation.setColor(Color.getGreenYellow());
        tf.getPage().getAnnotations().add(highlightAnnotation);
    }

    doc.save("/home/hossam/Downloads/" + "PDF_Highlighting_2.pdf");

and its working fine for me but the problem is it highlight all the lines which contains the phrase like this:
Screenshot from 2020-12-01 13-35-30.png (34.4 KB)
but I need to highlight just “owed to the”, so please any advice here?
Thanks in advance.

@hossam992

Could you please share your sample PDF file as well. We will test the scenario in our environment and address it accordingly.

Please check the attached files and please note that I updated the code.

file to test highlight on.pdf (11.9 KB)
the result I want it.pdf (12.7 KB)
the result I get it.pdf (48.2 KB)

@hossam992

Please try to modify a part of your code snippet as follows and let us know in case you still face any issue:

for(TextFragment tf : tfa.getTextFragments()){
 for(TextSegment ts : tf.getSegments()){
  HighlightAnnotation highlightAnnotation = new HighlightAnnotation(tf.getPage(), ts.getRectangle());
  highlightAnnotation.setColor(Color.getGreenYellow());
  tf.getPage().getAnnotations().add(highlightAnnotation);
 }
}
1 Like

Thanks it’s fixed.
Please another question when I try to highlight an Arabic word the process failed.
but If I reverse the word it’s success.
Is there us any way to highlight Arabic text?

@hossam992

Would you please share the sample PDF document with Arabic text. We will test the scenario in our environment and address it accordingly.

Please let’s try with this file
file to test arabic highlight.pdf (8.5 KB)
and for this phrase “يخص التطبيقات الحاسوبية” using the updated code by you in the prev comment.
Thanks for your interesting.

@hossam992

We were able to replicate the issue in our environment that API was not able to find the text with line break. We have logged an issue as PDFJAVA-39987 in our issue tracking system for the sake of correction. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.