Convert HTML to PDF using Aspose.PDF for .NET - Issues related to fonts

Hi,

During html to pdf conversion we’ve faced with an strange aspose behavior:

  1. If we use such code
    var pdf = new Aspose.Pdf.Document(inputFilePath);
    we have
    Aspose.Pdf.InvalidPdfFileFormatException: Startxref not found
    exception

  2. if we use such code (encoding from file)
    var htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions { InputEncoding = "utf-8" };
    var pdf = new Aspose.Pdf.Document(inputFilePath, htmlLoadOptions);
    we have
    Aspose.Pdf.FontNotFoundException: Font Mangal was not found
    exception

  3. If we use such code
    var htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions { InputEncoding = "iso-8859-1" };
    var pdf = new Aspose.Pdf.Document(inputFilePath, htmlLoadOptions);
    constructor passed succesfully and conversion performed using
    pdf.Convert() and pdf.Save() methods but result pdf file has incorrect encoding

We think that it is an issue in Aspose.Pdf as it cannot create Document class with correct encoding or without specifying it and work only for incorrect encoding.

result.pdf (1.2 MB)
Testfile_2_Sverige – Wikipedia.zip (99.7 KB)

@uaprogrammer

Thanks for contacting support.

In this case, Document constructor expects a PDF document and you were passing a HTML document to the constructor. Without specifying any load options, Document constructor will take document as PDF by default.

The source HTML document involves usage of specific fonts i.e. Mangal and Gautami. In order to obtain correct conversion results, you need to install these fonts in your environment. After installing those fonts in our environment and using following code snippet, we were able to generate correct output:

Aspose.Pdf.HtmlLoadOptions objLoadOptions = new Aspose.Pdf.HtmlLoadOptions(dataDir);
objLoadOptions.PageInfo.Margin.Bottom = 0;
objLoadOptions.PageInfo.Margin.Top = 0;
objLoadOptions.PageInfo.Margin.Right = 0;
objLoadOptions.PageInfo.Margin.Left = 0;
Aspose.Pdf.Document doc = new Aspose.Pdf.Document(dataDir + "Testfile_2_Sverige – Wikipedia.html", objLoadOptions);
doc.Save(dataDir + "SamplefromHtml.pdf");

SamplefromHtml.pdf (1.6 MB)

You do not need to change encoding, but install required fonts in your machine/device, so that API can convert HTML into PDF correctly. Please try again with latest version Aspose.PDF for .NET 18.6, after installing specific fonts and if issue still persists, feel free to let us know.

Ok, i will retry with installed fonts but I still have a question - why Aspose does not require these fonts if we pass ‘iso-8859-1’ encoding? In such case it should throw the same exception like with ‘utf-8’ because fonts still missed? It is a aspose bug?

@uaprogrammer

Thanks for getting back to us.

We have logged an investigation ticket as PDFNET-44975 in our issue tracking system, in order to investigate this behavior of the API. We will further investigate whether handling such scenarios is feasible or not. In case of further updates regarding investigation, we will surely let you know. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi,

Would you be so kind to provide us with status on the PDFNET-44975 issue?

Best regards,

Oleh

@uaprogrammer

We are afraid that earlier logged issue could not get resolved due to other high priority issues in the queue. We will surely inform you as soon as there are some certain updates regarding its resolution. Please spare us little time.

We are sorry for the inconvenience.

Hi,

We are wondering if there are any updates with this issue?

BR
Oleh

@uaprogrammer

We are afraid that earlier logged issue could not get resolved due to low priority as it was logged under normal support. However, we will surely inform you as soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hello,

We are wondering if there are any updates regarding the issue in Aspose.

Thank you in advance.

Best regards,
Oleh

@uaprogrammer

Thanks for contacting support.

Sadly, the earlier logged ticket has not been yet resolved due to other pending issues in the queue logged prior to yours. However, we will certainly inform you as soon as we have some definite updates regarding ticket resolution. Please spare us some time.

We are sorry for the inconvenience.

Hi,

We are wondering if there are any updates with this issue?

BR
Oleh

@uaprogrammer

We are afraid that the issue PDFNET-44975 is not yet resolved. We will surely let you know once we have some news about its resolution ETA or fix.

We are sorry for your inconvenience.