Convert PDF to HTML with broken font

Hi

There is a problem when I use Aspose to Convert PDF files to HTML.

The picture shows the result.
It seems that the font is broken

Both .NET & Java version cause this problem.

Here is my code in C#:

HtmlSaveOptions option = new HtmlSaveOptions();
Document tempdoc = new Document(“1.pdf”);
tempdoc.Save(“1.html”, option);

Please help me to fix this, Thanks.

Hi,


Thanks
for using our API’s.
<o:p></o:p>

I have tested the scenario and I am able to notice the same problem. For the sake of correction, I have logged this problem as PDFNEWNET-37979 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

Hi

I also found this problem in Java version 11.1.0
Is this fix going to have an ETA?

Hi Craig,


Thanks for your patience.

I am afraid the earlier reported issue is still not resolved as the team has been busy fixing other previously reported issues. However I have intimated the product team to share the possible ETA. As soon as we have some further updates, we will let you know.

Your patience and comprehension is greatly appreciated in this regard.

Hi there


About these broken Chinese font characters, there might be something to do with FreeType’ s bytecode interpreter

Before 2.4, bytecode interpreter is not enabled, rendering with some Chinese fonts in a PDF file would make the result characters broken.

FreeType and Patents

I don’t know much about Linux OS or JVM.
Hope this information may help to solve the broblem. :slight_smile:

Hi Craig,


Thanks for your feedback. It seems your issue has been fixed in result of some other fix, I have tested the conversion using following code snippet and unable to notice the reported issue. Please download and try latest version of Aspose.Pdf for .NET and share the results, hopefully it will resolve the issue.

Aspose.Pdf.Document
doc = new Document(“D:\Downloads\1.pdf”);<o:p></o:p>

// Instantiate HTML Save options object

HtmlSaveOptions newOptions = new HtmlSaveOptions();

//newOptions.PreventGlyphsGrouping = false;

// Enable option to embed all resources inside the HTML

newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted

newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

// Output file path

string outHtmlFile = @"D:\Downloads\1.html";

doc.Save(outHtmlFile, newOptions);


Best Regards,

Hi Tilal.ahmad

Thanks for your help.

But this problem still happens with Java version 11.7.0.
Please check this with Aspose PDF Java again, thanks :slight_smile:

Hi Craig,


Thanks for your inquiry. I have tested the scenario with Aspose.Pdf for Java 11.7.0 and unable to notice the reported issue. Please some more details so we will look into it and will guide you accordingly.

// Load source PDF file<o:p></o:p>

Document doc = new Document("D:\\Downloads\\1 (6).pdf");

// Instantiate HTML Save options object

HtmlSaveOptions newOptions = new HtmlSaveOptions();

// Enable option to embed all resources inside the HTML

newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted

newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

// Output file path

String outHtmlFile = myDir+"Single_output.html";

// Save the output file

doc.save(outHtmlFile, newOptions);


Please feel free to contact us for any further assistance.

Best Regards,

Hi


I checked your result output file again.
The same problem still exists.

With 11.7.0, the result fonts are still broken.
Please check the comparison jpgs, thanks :slight_smile:

Hi Craig,


Thanks your feedback. I have noticed that the resultant HTML works fine in Chrome but font is broken in IE and Firefox browsers. I have passed on the information to product team, they will investigate the issue and will fix it accordingly. We will keep you updated about the issue resolution progress.

We are sorry for the inconvenience.

Best Regards,

The issues you have found earlier (filed as PDFNET-37979) have been fixed in Aspose.PDF for .NET 20.2.