Hi
I use Aspose PDF 17.5 for Java to save PDF file into HTML format.
In the result of several pages(running under Linux environment), some part of text right-shifted, which make them covered by other texts.
Please check this issue and the attachment.
P.S. Here is my code, you can wrap it into a runnable package to run under Linux environments
String filePath = “870__1.pdf”;
Document pdf = new Document(filePath);
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
String dirName = UUID.randomUUID().toString();
File file = new File("" + dirName);
file.mkdirs();
for (int p = 1; p <= pdf.getPages().size(); p++) {
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
try {
byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream.available()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
};
String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream("" + dirName + “/” + p + “.html”));
}
P.S.S Environment:
JAVA 1.8, CentOS 7
870__1.pdf (2.0 MB)
parts_of_result.zip (2.9 MB)