A segment of text disappeared in the result of saving a pdf into HTML format

craig.w.su · June 27, 2017, 7:02am

Hi there

I am using Aspose PDF 17.5 for JAVA to convert pdf files into HTML format
Here is my code for test:

String fileName = “0672336979.pdf”;

Document pdf = new Document(“custom/input/pdf/” + fileName);

File outputDir = new File(“custom/output/pdf/” + fileName + “/”);
if (!outputDir.exists())
outputDir.mkdir();

HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);

for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);

final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = IOUtils.toByteArray(htmlSavingInfo.ContentStream);
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};

String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(),
new FileOutputStream(“custom/output/pdf/” + fileName + “/” + p + “.html”));

In the result of page#14, the text is missing.
I have uploaded the pdf file and the result.
Please check this issue, thank you~

Craig
0672336979.pdf (1.8 MB)
result.zip (2.0 MB)

asad.ali · June 27, 2017, 11:32am

@craig.w.su

Thanks for contacting support.

I have tested the scenario using your document(s) and code snippet with Aspose.Pdf for Java 17.5 and was able to notice that the page was blank in HTML result. I have also noticed that the page contains nothing but a message saying “This page was left blank intentionally.”, which was visible while printing the document but missing in HTML file.

Hence I have logged an issue as PDFJAVA-36862 in our issue tracking system. We will further investigate it and keep you posted with the status of its correction. Please be patient and spare us little time. We are sorry for the inconvenience.

Best Regards,
Asad Ali