Free Support Forum - aspose.com

Converting PDF files into HTML format with problems

Hi,

I am using aspose pdf 11.5.0 to convert a pdf file to a html file.
But the first page of the transfer result appears twice.

Java code with 11.5.0:
public void asposeConvert() throws FileNotFoundException, IOException {

Document pdf = new Document("custom/input/pdf/870test.pdf");
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);

for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item(p));

final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream.getLength()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
try {
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
};

String outHtmlFile = "SomeUnexistingFile.html";
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream("custom/output/pdf/870test." + p + ".html"));
}
}

Also I found that if updated to version 17.2.0, program will be hanging.

Java code with 17.2.0:
public void asposeConvert() throws FileNotFoundException, IOException {

Document pdf = new Document("custom/input/pdf/870test.pdf");
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);

for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item(p));

final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = IOUtils.toByteArray(htmlSavingInfo.ContentStream);
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
IOUtils.closeQuietly(htmlSavingInfo.ContentStream);
}
}
};

String outHtmlFile = "SomeUnexistingFile.html";
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream("custom/output/pdf/870test." + p + ".html"));
}
}

I also uploaded the origin file.
Please help me figure it out what happend, thanks.


Craig

Hi Craig,


Thanks for contacting support.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Arial; -webkit-text-stroke: #000000} span.s1 {font-kerning: none}

I have tested the scenario and have managed to reproduce the problem that PDF to HTML conversion for the provided PDF file takes very long time and it hangs. For the sake of correction, I have logged it as PDFJAVA-36611 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.


Best Regards,

The issues you have found earlier (filed as PDFJAVA-36611) have been fixed in Aspose.PDF for Java 18.4. This message was posted using BugNotificationTool from Downloads module by asad.ali