HTML with Foreign Characters Creates PDF with Square Characters

Hello,

I am testing Aspose.HTML .NET 18.5.0.

Converting the html file in the attached zip file to PDF shows the non-English characters as squares.

Converting the same html file to TIFF does maintain the non-English characters but some of them are positioned improperly, overlapping previous text.

I have three questions:

  1. How can I get the proper characters in the PDF file?
  2. Is there a way to fix the text layout in the TIFF image?
  3. Can I create a multi-paged TIFF image instead of serialised TIFF images?

Sample HTML and output: ForeignChars.zip (177.2 KB)

Thanks,

Sheri

@sheri_steeves

Thank you for contacting support.

We have worked with the data shared by you and have been able to reproduce the issue in our environment. Below tickets have been logged in our issue management system for further investigation and resolution.

HTMLNET-1271: Problem with non-English characters
HTMLNET-1272: Feature request to render multi-frame TIFF image

The ticket IDs have been linked with this thread so that you will receive notifications as soon as the tickets are resolved.

Moreover, we would like to elaborate that you may not make any layout changes to avoid the problems, Aspose.HTML for .NET API itself takes care of layouts and this issue will be rectified once investigated and resolved. Furthermore, until the feature of multi-frame TIFF image will be supported, you may follow an alternate approach to achieve your requirements. An HTML file can be converted to PDF document and then can be rendered to multi-frame TIFF image with Aspose.PDF for .NET API. Please refer to below documentation articles for your kind reference.

We are sorry for the inconvenience.

@sheri_steeves

We have investigated the ticket HTMLNET-1272 and would like to share with you that a multi-frame TIFF image can be generated by using below code snippet in your environment.

        Aspose.Html.Rendering.Image.ImageRenderingOptions pdf_options = new Aspose.Html.Rendering.Image.ImageRenderingOptions();
        pdf_options.Format = Aspose.Html.Rendering.Image.ImageFormat.Tiff;
        // Instantiate PdfDevice object while passing PdfRenderingOptions and resultant file path as arguments
        using (Aspose.Html.Rendering.Image.ImageDevice pdf_device = new Aspose.Html.Rendering.Image.ImageDevice(pdf_options, dataDir + "Aspose_HTML.tiff"))
        // Create HtmlRenderer object
        using (Aspose.Html.Rendering.HtmlRenderer renderer = new Aspose.Html.Rendering.HtmlRenderer())
        // Create HtmlDocument instance while passing path of already created HTML file
        using (Aspose.Html.HTMLDocument html_document = new Aspose.Html.HTMLDocument(dataDir + "fax.html"))
        {
            // Render the output using HtmlRenderer
            renderer.Render(pdf_device, html_document);
        }

We hope this will be helpful. Please feel free to contact us if you need any further assistance.

@Farhan.Raza,

Thanks for the followup. Since the PDF doesn’t contain the correct characters, the multi-page TIFF from that PDF would be incorrect as well but it is good to know that there is a workaround for the TIFF creation.

@sheri_steeves

We would like to elaborate that updated code snippet does not include intermediate step of conversion to PDF format. Aspose.HTML for .NET API renders an HTML file to a TIFF file directly. PDF generation takes place during workaround approach with Aspose.PDF for .NET API. We hope this will clarify the concept. Please feel free to contact us if you need any further assistance.

The issues you have found earlier (filed as HTMLNET-1271) have been fixed in this update.