Hi there
I am using Aspose PDF 17.2.0 to save pdf files into HTML format.
Here is my code for testing:
String fileName = “Dropbox 新手指南.pdf”;
Document pdf = new Document(“custom/input/pdf/” + fileName);
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
File f = new File(“custom/output/pdf/” + fileName + “/”);
f.mkdirs();
for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = IOUtils.toByteArray(htmlSavingInfo.ContentStream);
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};
String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(),
new FileOutputStream(“custom/output/pdf/” + fileName + “/” + p + “.html”));
}
For instance, In the result of page #2 , some parts of text are disappeared, but actually, they are shifted very far away at the right side.
Please check the pdf file and the result in the attachment, and analyze this issue.
There should be more more case of this issue in other result pages also.
Craig