Unicode Symbols are Lost after HTML to PDF Conversion using .NET


Unicode characters are not converted correctly (Aspose.Words 21.9.0)

    var inputFile = new FileInfo(@"Documents/helloUnicode.html");
    using var input = inputFile.OpenRead();

    var fileInfo = Aspose.Words.FileFormatUtil.DetectFileFormat(input);
    using var document = new PdfDocument(input, new PdfHtmlLoadOptions
        InputEncoding = fileInfo.Encoding?.BodyName

    using var output = File.OpenWrite("helloUnicode.pdf");

    // heavy check marks (U+2714 U+FE0F) and cross mark (U+274C) not printed in output pdf
    // empty box printed instead

    // Hint: saving as Tiff image with new Aspose.Words.Document().Save(output, SaveFormat.Tiff); works!

Best regardshelloUnicode.zip (71.5 KB)


By using following simple code example, we have not faced the shared issue. So, please use it to get the desired output. We have attached the output PDF with this post for your kind reference. 21.9.pdf (56.0 KB)

Aspose.Words.Loading.HtmlLoadOptions htmlLoadOptions = new Aspose.Words.Loading.HtmlLoadOptions();
htmlLoadOptions.Encoding = Encoding.UTF8;
Aspose.Words.Document doc = new Aspose.Words.Document(MyDir + "helloUnicode.html", htmlLoadOptions);
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
doc.Save(MyDir + "21.9.pdf");

Moreover, please note that Aspose.Words requires TrueType fonts when rendering document to fixed-page formats (JPEG, PNG, PDF or XPS). You need to install fonts that are used in your document on the machine where you are converting documents to PDF. Please refer to the following articles:

Using TrueType Fonts
Manipulating and Substitution TrueType Fonts