Free Support Forum - aspose.com

Html to pdf error

Hi,

During html to pdf conversion we’ve faced with an strange aspose behavior:

  1. If we use such code
    var pdf = new Aspose.Pdf.Document(inputFilePath);
    we have
    Aspose.Pdf.InvalidPdfFileFormatException: Startxref not found
    exception

  2. if we use such code (encoding from file)
    var htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions { InputEncoding = "utf-8" };
    var pdf = new Aspose.Pdf.Document(inputFilePath, htmlLoadOptions);
    we have
    Aspose.Pdf.FontNotFoundException: Font Mangal was not found
    exception

  3. If we use such code
    var htmlLoadOptions = new Aspose.Pdf.HtmlLoadOptions { InputEncoding = "iso-8859-1" };
    var pdf = new Aspose.Pdf.Document(inputFilePath, htmlLoadOptions);
    constructor passed succesfully and conversion performed using
    pdf.Convert() and pdf.Save() methods but result pdf file has incorrect encoding

We think that it is an issue in Aspose.Pdf as it cannot create Document class with correct encoding or without specifying it and work only for incorrect encoding.

result.pdf (1.2 MB)
Testfile_2_Sverige – Wikipedia.zip (99.7 KB)

@uaprogrammer

Thanks for contacting support.

In this case, Document constructor expects a PDF document and you were passing a HTML document to the constructor. Without specifying any load options, Document constructor will take document as PDF by default.

The source HTML document involves usage of specific fonts i.e. Mangal and Gautami. In order to obtain correct conversion results, you need to install these fonts in your environment. After installing those fonts in our environment and using following code snippet, we were able to generate correct output:

Aspose.Pdf.HtmlLoadOptions objLoadOptions = new Aspose.Pdf.HtmlLoadOptions(dataDir);
objLoadOptions.PageInfo.Margin.Bottom = 0;
objLoadOptions.PageInfo.Margin.Top = 0;
objLoadOptions.PageInfo.Margin.Right = 0;
objLoadOptions.PageInfo.Margin.Left = 0;
Aspose.Pdf.Document doc = new Aspose.Pdf.Document(dataDir + "Testfile_2_Sverige – Wikipedia.html", objLoadOptions);
doc.Save(dataDir + "SamplefromHtml.pdf");

SamplefromHtml.pdf (1.6 MB)

You do not need to change encoding, but install required fonts in your machine/device, so that API can convert HTML into PDF correctly. Please try again with latest version Aspose.PDF for .NET 18.6, after installing specific fonts and if issue still persists, feel free to let us know.

Ok, i will retry with installed fonts but I still have a question - why Aspose does not require these fonts if we pass ‘iso-8859-1’ encoding? In such case it should throw the same exception like with ‘utf-8’ because fonts still missed? It is a aspose bug?

@uaprogrammer

Thanks for getting back to us.

We have logged an investigation ticket as PDFNET-44975 in our issue tracking system, in order to investigate this behavior of the API. We will further investigate whether handling such scenarios is feasible or not. In case of further updates regarding investigation, we will surely let you know. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi,

Would you be so kind to provide us with status on the PDFNET-44975 issue?

Best regards,

Oleh

@uaprogrammer

We are afraid that earlier logged issue could not get resolved due to other high priority issues in the queue. We will surely inform you as soon as there are some certain updates regarding its resolution. Please spare us little time.

We are sorry for the inconvenience.