@Batrinux
You may please use following code snippet to search and highlight text in PDF document and later convert it into HTML:
Document doc = new Document(dataDir + "sample.pdf");
TextFragmentAbsorber tfa = new TextFragmentAbsorber("demonstration", new TextSearchOptions(true));
doc.getPages().get_Item(1).accept(tfa);
for(TextFragment tf : tfa.getTextFragments())
{
HighlightAnnotation highlightAnnotation = new HighlightAnnotation(tf.getPage(), tf.getRectangle());
highlightAnnotation.setColor(Color.getGreenYellow());
tf.getPage().getAnnotations().add(highlightAnnotation);
}
doc.save(dataDir + "output.pdf");
doc = new Document(dataDir + "output.pdf");
doc.save(dataDir + "output.html", new HtmlSaveOptions());
We have observed that highlighted text was not visible when we converted a sample PDF into HTML using Aspose.PDF for Java 19.6. Therefore, an issue has been logged as PDFJAVA-38657 in our issue tracking system for the sake of correction. We will surely look into details of it and keep you posted with the status of its resolution. Please spare us little time.
We are sorry for the inconvenience.