Discrepancies when converting an HTML to different formats


#1

Hello,
we are facing issues while converting the same HTML file to different formats: PDF, TIFF, DOCX, BMP.

The PageCount returned after setting the proper page format parameters for the output is 12.

Output results after save:

  1. The PDF generated counts 11 pages and misses part of the Text between pages 7 and 8
  2. The TIFF generated counts 11 images, with the same problem as the PDF between images 7 and 8.
  3. The DOCX generated counts 12 pages, and the text is not lost. Anyway on the 11-th page two small icons are missing (saving the generated DOCX as PDF via Word10 application, the PDF is ok and also counts 12 pages).
  4. Saving the pages as single page BMPs causes an Exception converting page 12 because the document is reported to be of 11 pages in this conversion (see point 1, 12 expected).

Please find attached HTML_CONV_ISSUES.zip containing the sample HTML file to be converted and the java code to reproduce the issues.

The library used is aspose-words-19.3-jdk16.jar

Could you please checkHTML_CONV_ISSUES.zip (7.9 MB) ?
Thanks!


#2

@renato.mauro

We are investigating this issue and will get back to you soon.


#3

@renato.mauro

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-18462. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.