PDF to HTML --> Break each letter to a separate div (JAVA)

Hi.

I used the JAVA Aspose for PDF (version 9.7.1) in order to convert a PDF to html.

My (first) problem is that some of the text is converted to html by separating each character to its own

. I think it happen when there is a style change like italic-font. This is a critical issue for me because it affects the search results.

Another issue is that the rest of the text is converted when each line is in its own

but each 2-3 words are in a separated .

(Note that I used the java-aspose and I got these two issues, but when I used the .Net-aspose the first issue didn't occure, only the second one)

I attached a test pdf (copied from Wikipedia).

My JAVA code in order to convert the attached pdf file to html is:

--------------------------------------------------------------------------------

File mainHtmlFile = createNewHtmlFile();
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(inStream);
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
options.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
pdfDocument.save(mainHtmlFile.getAbsolutePath(), options);

---------------------------------------------------------------------------------

Can you help me to fix these two issues? or at least the first one?

Thanks

Tami

Hi Tami,


Thanks for contacting support.

I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWJAVA-34714. We
will investigate this issue in details and will keep you updated on the status
of a correction. <o:p></o:p>

We apologize for your inconvenience.

Anything new?

Hi Tami,


Thanks for your inquiry. I am afraid we have noticed your issue recently. It is pending for investigation due to other issues, already under investigation and resolution. We will notify you as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience caused.

Best Regards,