Hi
I am using Aspose PDF 17.9 to save pdf files into HTML format.
Here is the code for test:
String fileName = "BOX_v4_20170929_1.pdf";
Document pdf = new Document("custom/input/pdf/" + fileName);
new File("custom/output/pdf/" + fileName + "/").mkdirs();
for (int p = 1; p <= pdf.getPages().size(); p++) {
System.out.println("Page:" + p);
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item(p));
pageDoc.getPageInfo().setMargin(new MarginInfo(0, 0, 0, 0));
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
htmlSaveOps.setPreventGlyphsGrouping(true);
final StringBuilder htmlBuffer = new StringBuilder();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
htmlBuffer.append(IOUtils.toString(htmlSavingInfo.ContentStream, "utf8"));
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};
String outHtmlFile = "SomeUnexistingFile.html";
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(htmlBuffer.toString().getBytes("UTF-8"),
new FileOutputStream("custom/output/pdf/" + fileName + "/" + p + ".html"));
}
Issue:
1.
In the result, there are several characters missing.
After we checked the result html file. we found that “visibility:hidden” is added to them.
- Although we remove “visibility:hidden”,
some of the Chinese characters in html are not the same as the original pdf file.
result and images.zip (1.1 MB)
BOX_v4_20170929_1.pdf (249.3 KB)
I uploaded some image to describe the issue, the pdf file and the result.
Please check the attachment and this issue. Thank you
Craig