Some character missing in the result of Saving a PDF into HTML format(with browser Safari)

Hi

I am using Aspose PDF 17.8 to convert pdf file into HTML format
Here is my code for test:

String fileName = “Dropbox 新手指南.pdf”;
Document pdf = new Document(“custom/input/pdf/” + fileName);
new File(“custom/output/pdf/” + fileName + “/”).mkdirs();

for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
pageDoc.getPageInfo().setMargin(new MarginInfo(0, 0, 0, 0));

HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
htmlSaveOps.setPreventGlyphsGrouping(true);

final StringBuilder htmlBuffer = new StringBuilder();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
htmlBuffer.append(IOUtils.toString(htmlSavingInfo.ContentStream, “utf8”));
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};

String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(htmlBuffer.toString().getBytes(“UTF-8”),
new FileOutputStream(“custom/output/pdf/” + fileName + “/” + p + “.html”));
}

Issue:
This issue can only be observed in the specific browser: Safari
In the generated result, there are some character missing.
We found this might be something to do with the rendering from css script,
but we don’t know what’s going on in there.

Dropbox 新手指南.pdf (1.1 MB)
result-and-showing-css.zip (404.9 KB)

I uploaded the PDF file, one page of the results.
Please check the attachments, and this issue. Thank you

Craig

@craig.w.su

Thanks for contacting support.

We have tested the scenario while using your code snippet with Aspose.Pdf for Java 17.9 and were unable to notice missing characters issue. For your reference, we have attached an output and screenshot of HTML content opened in Safari browser.

Dropbox.png (53.5 KB)
7.zip (55.5 KB)

Please try again the scenario with latest version of the API and in case you still face any issue, please share your environment details (i.e OS Version, JDK Version, etc), so that we can test the scenario in specified environment and address it accordingly.

Hi
@asad.ali

I open the result from you, this issue still exits.
Please check Mac OS version, and Safari version in the attachment.

macOS.png (74.2 KB)
safari.png (78.6 KB)

Craig

@craig.w.su

Thanks for writing back.

We have managed to replicate the issue in specified environment and for the sake of correction, we have logged it as PDFJAVA-37206 in our issue tracking system. We will further check the details of the issue and keep you posted with the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan