We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Charaters stick together in the Html files from PDF conversion

Hi Aspose team


We have PDF files converted into HTML file format for cross-platform reading with Aspose PDF 11.7.0.
Then there is a situation that some of characters stick together, which is quite different from the origin PDF file, and unable to read as usual.


Here is the code we used for test:
Document pdf = new Document(“custom/input/pdf/p7_1.pdf”);

for(int p = 1; p<=pdf.getPages().size();p++){
Document pageDoc = new Document();
pageDoc.getPages().add(pdf.getPages().get_Item§);
HtmlSaveOptions htmlSaveOps = new HtmlSaveOptions();
htmlSaveOps.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlSaveOps.FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsWOFF;
htmlSaveOps.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlSaveOps.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlSaveOps.setSplitIntoPages(false);
final ByteArrayOutputStream stream = new ByteArrayOutputStream();
htmlSaveOps.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy() {
@Override
public void invoke(
com.aspose.pdf.HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo) {
byte[] resultHtmlAsBytes = new byte[(int) htmlSavingInfo.ContentStream
.getLength()];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0,
resultHtmlAsBytes.length);
try {
stream.write(resultHtmlAsBytes);
stream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
};

String outHtmlFile = “SomeUnexistingFile.html”;
pageDoc.save(outHtmlFile, htmlSaveOps);
IOUtils.write(stream.toByteArray(), new FileOutputStream(“custom/output/pdf/p7_1.”+p+".html"));
}

Is there any option to fix this?
BTW, The Chinese text in this PDF are arranged vertically. Hope this information helps.
I 've uploaded attachments which contains the origin PDF file and the result HTML file, Please check this, thank you.

Best,
Craig


Hi Craig,

Thanks for your inquiry. I have tested PDF to HTML conversion with shared document using Aspose.Pdf for Java 11.7.0 and managed to observe the reported issue. For further investigation, I have logged an issue in our issue tracking system as PDFJAVA-36056 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

Best Regards,

Hi

Is there any progress about this issue?

@craig.w.su

Thanks for your inquiry.

I have checked the status of the issue and I am afraid that it is not yet resolved. Though relevant team has started investigating the root cause of this issue. As soon as they make some further progress towards its resolution, we will surely update you. Your patience in this regard is greatly appreciated. Please spare us little time.

We are sorry for the inconvenience.