On PDF version of html content, Japanese characters converted to boxes

Aspose.Words and Aspose.PDF version 21.0.0 with .NET core 6 unable to print Japanese characters. Exporting html content to PDF using below code works locally but when it deployed using Docker it does not display Japanese characters but display as boxes. The html content stored in variable wordHtmlTemplate is build using string builder by appending lines. Here is the code:

using Aspose.Words;
using System.IO;

var license = new License();
license.SetLicense(Path.Combine(licenceFilePath, "AsposeLicence", "Aspose.Total.NET.lic"));
Document doc = new Document();
DocumentBuilder documentBuilder = new DocumentBuilder(doc);
documentBuilder.InsertHtml(wordHtmlTemplate, true);
MemoryStream outStream = new MemoryStream();
doc.Save(outStream, SaveFormat.Pdf);
byte[] docBytes = outStream.ToArray();
return docBytes;

@kd2023 Could you please attach your input and output documents here for testing? We will check the issue and provide you more information.

@alexey.noskov Attached docx file which includes html content which has some Japanese characters but after pdf conversion those characters displayed as boxes after docker deployment. Locally on Windows machine it display correctly.

htmlTemplateToExport.docx (14.0 KB)

@kd2023 Thank you for additional information. The problem occurs because there are no required fonts in your Linux environment. I can reproduce the problem on a clean Linux Docker using the following simple code:

Document doc = new Document();
DocumentBuilder documentBuilder = new DocumentBuilder(doc);
documentBuilder.InsertHtml(File.ReadAllText("/temp/in.html"), true);
doc.Save(@"/temp/out_without_fonts.pdf");

out_without_fonts.pdf (28.6 KB)

however, if put a font (with east Asian glyphs) into a folder and use this folder as a font folder source, the output is correct:

Document doc = new Document();

doc.FontSettings= new FontSettings();
doc.FontSettings.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"/temp/fonts/", true) });

DocumentBuilder documentBuilder = new DocumentBuilder(doc);
documentBuilder.InsertHtml(File.ReadAllText("/temp/in.html"), true);

doc.Save(@"/temp/out.pdf");

out.pdf (34.6 KB)

I have put MS Mincho font into the /temp/fonts/ folder.

So to get the desired output, you should have the required fonts in the environment where the document is converted to PDF. Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

@alexey.noskov, I am running application using Docker and installed required fonts using msttcorefonts-installer package for Alpine Linux distribution. But can not see “Mincho” font installed out there and still having issue while print Japanese characters. Is there any font package available for Alpine distribution which would cover most of languages rather adding specific fonts in application folder?

@kd2023 You are right, msttcorefonts-installer package does not include all MS fonts, it includes only basic fonts. Unfortunately, ,there is no single package that contain all MS fonts.
You can try installing free Noto fonts and use Noto Fonts Fallback Settings. You can load them using FontFallbackSettings.LoadNotoFallbackSettings. Or you can customize the predefined fallback setting according to the fonts available in your environment.

1 Like

@alexey.noskov After setting font source path as fallback settings, does Aspose still looks into default folders first? i.e. for Linux one of default location where Aspose looks for fonts is “usr/share/fonts/”

@kd2023 You can specify several fonts sources and you can specify font sources priority.
https://reference.aspose.com/words/net/aspose.words.fonts/fontsettings/setfontssources/
Aspose.Words will look for fonts according to the font source priority specified. If you need to look the system fonts first, you should use SystemFontSource as the first font source.

@alexey.noskov Added below code snippet to test if it picking up Noto fonts from directory:

[HttpGet]
[Route("generatepdfv1")]
public async Task<ActionResult> GeneratePdfV1(string content)
{
    if (Directory.Exists(Path.Combine(_hostEnvironment.ContentRootPath, "Fonts", "Noto")))
    {
        content = string.Concat(content, "Fonts available are: ", string.Join(",", Directory.GetFiles(Path.Combine(_hostEnvironment.ContentRootPath, "Fonts", "Noto"))));
    }
    else
    {
        content = string.Concat(content, "Directory does not exists");
    }

    content = string.Concat(content, "Is Fonts/Noto directory exists", Directory.Exists(Path.Combine(_hostEnvironment.ContentRootPath, "Fonts", "Noto")));
    // Generate the PDF as a MemoryStream
    using (MemoryStream stream = new MemoryStream())
    {
        // Create a new Document object
        Document doc = new Document();

        // Convert the string content to a byte array
        byte[] contentBytes = System.Text.Encoding.UTF8.GetBytes(content);

        // Load the byte array into the document
        using (MemoryStream contentStream = new MemoryStream(contentBytes))
        {
            doc.RemoveAllChildren();
            doc.AppendDocument(GetDocumentInstance(contentStream), ImportFormatMode.KeepSourceFormatting);
        }

        // Save the document as PDF
        doc.Save(stream, SaveFormat.Pdf);

        // Return the PDF as a file attachment
        return await Task.Run(() => File(stream.ToArray(), HTTP_CONTEXT_RESPONSE_CONTENTTYPE_PDF, "Pdf1.pdf"));
    }
}

private Document GetDocumentInstance(Stream stream)
{
    Document document = new Document(stream);
    FontSettings fontSettings = new FontSettings();

    // Set the order of font sources
    fontSettings.SetFontsSources(new FontSourceBase[] { new FolderFontSource(Path.Combine(_hostEnvironment.ContentRootPath, "Fonts", "Noto"), false) });

    // Load Noto fallback settings
    fontSettings.FallbackSettings.LoadNotoFallbackSettings();

    // Disable default font substitutions
    fontSettings.SubstitutionSettings.DefaultFontSubstitution.Enabled = false;

    document.FontSettings = fontSettings;

    return document;
}

Above code snippet trying to export Japanese string to pdf file. Here is output after passing Japanese string as input

image.png (53.3 KB)
But still printing Japanese characters as boxes.

@kd2023 Unfortunately, I still cannot reproduce the problem on my side. Here is the PDF document produced on my side with Noto Japanese fonts:
out.pdf (9.8 KB)

@alexey.noskov Below code snippet seems working for simple Japanese input string but not for Document which is built by appending multiple document which has multiple sections. How it could be applied to whole document content?

// Specify the font folder
FontSettings fontSettings = new FontSettings();
fontSettings.SubstitutionSettings.DefaultFontSubstitution.Enabled = true;
fontSettings.SetFontsFolder(Path.Combine("Fonts"), true);
fontSettings.FallbackSettings.LoadNotoFallbackSettings();
fontSettings.SubstitutionSettings.FontInfoSubstitution.Enabled = false;
document.FontSettings = fontSettings;

@kd2023 Please make sure FontSettings are specified for the final document, which is saved as PDF. Could you please save the output document as DOCX and as PDF and attach them here for testing? If convert the generated DOCX document to PDF, does the final PDF look fine?