HTML to PDF Japanese language reflecting as box

I have Docker Image, and have HTML, I am trying to convert this type of HTML to PDF but Japanese, Chinese and Korean character are reflecting as box, attached the output file. Please suggest why characters are not appearing

Function -

void generatePDF2()
{
    string text = @"<!DOCTYPE html> <html lang="en" xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <title></title> </head> <body> <div style="border-spacing: 0; border-width: 0; padding: 0;"> <h3>英語⽇本語</h3> <span>英語⽇本語</span> </div><br/> <div> <table style="border-spacing: 0; border-width: 0; padding: 0;" aria-describedby="ack-template-desc"> <th id="acknowledgement-fragment" /> <tr> <td>Agreed By:</td> </tr> <tr> <td>Name:</td> <td>transition1 transition2</td> </tr> <tr> <td>Date:</td> <td>01/30/2025</td> </tr> </table> </div> </body> </html>

";

    using var memoryStream = new MemoryStream();
    using var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(text));

    var options = new HtmlLoadOptions();

    var document = new Document(htmlStream, options);

    Console.WriteLine(JsonConvert.SerializeObject(document.FontUtilities.GetAllFonts()));

    document.Save(memoryStream);

    File.WriteAllBytes("output1.pdf", memoryStream.ToArray());

}
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta charset="utf-8" />
      <title></title>
   </head>
   <body>
      <div style="border-spacing: 0; border-width: 0; padding: 0;">
      <h3>英語⽇本語</h3>
      <span>英語⽇本語</span> </div><br/> 
      <div>
         <table style="border-spacing: 0; border-width: 0; padding: 0;" aria-describedby="ack-template-desc"> 
         <th id="acknowledgement-fragment" /> 
         <tr>
            <td>Agreed By:</td>
         </tr>
         <tr>
            <td>Name:</td>
            <td>transition1 transition2</td>
         </tr>
         <tr>
            <td>Date:</td>
            <td>01/30/2025</td>
         </tr>
         </table> 
      </div>
   </body>
</html>
# Base SDK image (for running with `dotnet run`)
FROM mcr.microsoft.com/dotnet/sdk:8.0-alpine AS base
WORKDIR /app

# Install libgdiplus and dependencies
RUN apk add --no-cache libgdiplus msttcorefonts-installer font-noto-cjk fontforge libc6-compat fontconfig \
	chromium \
    chromium-chromedriver \
	gcompat \
	&& update-ms-fonts 

#RUN mkdir -p /usr/share/fonts/noto-cjk && \ 
	#wget -qO- https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip | busybox unzip -d /usr/share/fonts/noto - && \ 
	#wget -qO- https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKkr-hinted.zip | busybox unzip -d /usr/share/fonts/noto - && \ 
	#wget -qO- https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKsc-hinted.zip | busybox unzip -d /usr/share/fonts/noto - && \ 
	#wget -qO- https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKtc-hinted.zip | busybox unzip -d /usr/share/fonts/noto - && \ 
	#fc-cache -f 

RUN wget -qO- https://noto-website-2.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip | busybox unzip -d /usr/share/fonts/truetype/msttcorefonts - && \ 
	fc-cache -fv 

#RUN for font in /usr/share/fonts/noto-cjk/*.otf; do \ 
	#fontforge -lang=ff -c 'Open($1); Generate($2)' "$font" "${font%.otf}.ttf"; \ 
	#done

# Copy the project file and restore dependencies
COPY ["DownloadWepPage.csproj", "."]
RUN dotnet restore "./DownloadWepPage.csproj"

# Copy the entire project and set working directory
COPY . .
WORKDIR "/app/."

# Expose necessary ports (if applicable)
EXPOSE 5000

# Run the application using `dotnet run`
CMD ["dotnet", "run", "--project", "DownloadWepPage.csproj"]

output1.pdf (109.8 KB)

@subha21

The issue you’re experiencing with Japanese, Chinese, and Korean characters appearing as boxes in the generated PDF is likely due to missing font support for these languages in your Docker environment. To resolve this, you need to ensure that the appropriate fonts are installed and accessible to the Aspose.PDF library during the conversion process.

Here are some steps you can take to fix the problem:

  1. Install CJK Fonts: Ensure that you have the necessary CJK (Chinese, Japanese, Korean) fonts installed in your Docker image. You can use the Noto Sans CJK fonts, which are designed to support these languages. In your Dockerfile, you already have a command to install Noto Sans CJK fonts, but make sure that the fonts are correctly installed and accessible.

    You can add the following lines to your Dockerfile to ensure the fonts are installed correctly:

    RUN apk add --no-cache font-noto-cjk \
        && fc-cache -fv
    
  2. Set Font Directory: You may need to specify the font directory in your code. This can be done by setting the HtmlLoadOptions to include the font directory where the CJK fonts are located. Here’s an example of how to do this:

    var options = new HtmlLoadOptions();
    options.FontsFolder = "/usr/share/fonts/truetype/msttcorefonts"; // Adjust the path as necessary
    
  3. Check Font Availability: Before generating the PDF, you can check if the required fonts are available in your environment. You can use the FontUtilities class to list all available fonts and ensure that the CJK fonts are included.

  4. Test with Different HTML: If the issue persists, try testing with a simpler HTML document that contains only CJK characters to isolate the problem. This can help determine if the issue is with the HTML content or the font configuration.

By following these steps, you should be able to resolve the issue of CJK characters appearing as boxes in your PDF output.

If you continue to face issues, consider checking the Aspose documentation or community forums for additional troubleshooting tips related to font handling in PDF generation.

Sources:
[1]: ConvertHTMLToPDFFormat.java

I have tried every available font, it works only this html string as <p>英語⽇本語</p>, instead with complete html tags.

@subha21

Have you tried using 25.1 version of Aspose.Pdf.Drawing for .NET? Can you please share the generated output PDF for our reference as well?

output1.pdf (109.8 KB)

Here is the pdf with box

@subha21

We have noticed that you are using an older version of the API and that without license as well. Please try following:

In case issue still persists, please let us know.

output1.pdf (129.8 KB)

I tried with latest version of the package Aspose.PDF.Drawing 25.1 without license, getting same attached pdf file.

is this expected without licence? will it work with license?

@subha21

We are checking it and will get back to you soon.

1 Like

@subha21

The trial version of the API only have access to the 4 elements of any collection. This includes fonts as well. So it is possible that the trial version is not able to access all fonts in the system even if they are present.

We tried in our environment and noticed that some characters were rendered as box only. Rest of the string/text was correct.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59220

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
SpecialCharacters.pdf (462.9 KB)