Chinese characters on HTML to PDF with .NET display as boxes

Hello,

I’m trying to convert an HTML to PDF with some Simplified Chinese characters on it (and some more languages but let’s focus on Chinese first as the solution I guess would be the same for all the others) but I get some tofu characters on it.

If I visualize the HTML on the browser it displays correctly, has the reference to the Chinese supported font and everything looks good.

Here’s my PDF result: mypdf.pdf (7.1 MB)

Here’s my code:
var htmlOptions = new HtmlLoadOptions
{
InputEncoding = “UTF-8”,
IsEmbedFonts = true
};
var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(html));
var fontSource = new FolderFontSource(_fontsPath);
FontRepository.Sources.Add(fontSource);
htmlOptions.PageInfo = new PageInfo
{
Width = PageSize.PageLetter.Width,
Height = PageSize.PageLetter.Height,
Margin = new MarginInfo(0,0,0,0),
IsLandscape = false
};
var document = new Document(htmlStream, htmlOptions);
var pdf = new PdfFile(document);
var output = new MemoryStream();
document.Save(output, SaveFormat.Pdf);

Here’s the html code: Screenshot 2021-10-19 163455.png (33.8 KB)

Here’s how it looks on the browser: Screenshot 2021-10-19 163538.png (50.7 KB)

Version used: 21.9.0

I have tried with both CDN references and local font files on my local environment (.ttf, .otf files)

Thanks in advance for the support

@rmed1na

Can you please share the source HTML file so that we may try to reproduce the same on our end.

Sure, here it is:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>

<style>
    @import url('https://fonts.googleapis.com/css2?family=Noto+Sans+SC&display=swap');

    .text {
        font-family: 'Noto Sans SC', sans-serif;
    }
</style>
</head>
<body>
<p class='text'>Hello world. This is a test from English language. Below is some text in simplified 
chinese</p>
<br />
<p class='text'>CHINESE (SIMPLIFIED): 这是简体中文文本</p>
<p class='text'>CHINESE (TRADITIONAL): 這是繁體中文文本</p>
</body>
</html>

@rmed1na

A ticket with ID PDFNET-50796 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

1 Like

I ran into the same problem by using Aspose.PDF java 21.3

1 Like

@hanbd.me

I request you to upgrade to latest version and share the sample code, input file and generated file for our investigations.