Hi there
I am using Aspose PDF 17.5 for JAVA to convert pdf files into HTML format
Here is my code for test:
String fileName = “0672336979.pdf”;
Document pdf = new Document(“custom/input/pdf/” + fileName);
File outputDir = new File(“custom/output/pdf/” + fileName + “/”);
if (!outputDir.exists())
outputDir.mkdir();
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = IOUtils.toByteArray(htmlSavingInfo.ContentStream);
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};
String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(),
new FileOutputStream(“custom/output/pdf/” + fileName + “/” + p + “.html”));
In the result of page#14, the text is missing.
I have uploaded the pdf file and the result.
Please check this issue, thank you~
Craig
0672336979.pdf (1.8 MB)
result.zip (2.0 MB)