HIghlighting some text using regular expressions in pdf using aspose.pdf for java

Hootan · June 26, 2019, 6:38pm

Hi,

I have to highlight some text in a pdf document where the search criteria is provided. We are using aspose.pdf for java library with ColdFusion.
Can you please let me know

a) How to highlight text within a pdf?
b) If we convert that highlighted pdf to html, will that highlighting remain in the html file as well?

asad.ali · June 26, 2019, 10:24pm

@Batrinux

You may please use following code snippet to search and highlight text in PDF document and later convert it into HTML:

Document doc = new Document(dataDir + "sample.pdf");
TextFragmentAbsorber tfa = new TextFragmentAbsorber("demonstration", new TextSearchOptions(true));
doc.getPages().get_Item(1).accept(tfa);
for(TextFragment tf : tfa.getTextFragments())
{
 HighlightAnnotation highlightAnnotation = new HighlightAnnotation(tf.getPage(), tf.getRectangle());
 highlightAnnotation.setColor(Color.getGreenYellow());
 tf.getPage().getAnnotations().add(highlightAnnotation);
}
doc.save(dataDir + "output.pdf");

doc = new Document(dataDir + "output.pdf");
doc.save(dataDir + "output.html", new HtmlSaveOptions());

We have observed that highlighted text was not visible when we converted a sample PDF into HTML using Aspose.PDF for Java 19.6. Therefore, an issue has been logged as PDFJAVA-38657 in our issue tracking system for the sake of correction. We will surely look into details of it and keep you posted with the status of its resolution. Please spare us little time.

We are sorry for the inconvenience.