Aspose converts the docx file to pdf, and some Chinese characters are missing

I have a docx file. I use the pdf converted from aspose (22.8) to find that some Chinese characters are missing, such as those on page 4.
docx.docx (18.8 KB)
1672882659488.png (124.2 KB)

I found that the Unicode codes of these missing Chinese characters are different from those of normal Chinese characters. For example, the Chinese character “自” looks the same but cannot be converted normally. The picture is as follows:
1672882410936.png (4.3 KB)

@humanhuman Unfortunately, I cannot reproduce the problem on my side. Here is PDF document produced on my side: out.pdf (144.8 KB)

As I can see there are no missed characters. Could you please attach your output PDF document here for our reference? We will check it and provide you more information.

Thank you very much for your reply and help. I created a simple project. My code example is as follows. You can directly run the docx2PdfTest method in the test class. My docx file is under src/test/resources/document/, and the converted pdf file is under src/test/resources/pdf
demo.zip (402.0 KB)

I mark the details in the figure below:
1672905826743.png (104.1 KB)

The jar package I use is downloaded from here:
https://releases.aspose.com/words/java/

@humanhuman I see in your code you have specified fonts folder:

FolderFontSource folderFontSource = new FolderFontSource(FONTS_FOLDER, false, 1);

Could you please attach the fonts from this folder? The problem might occur because the fonts used in your document are not availabel in the environment where the document is converted. If Aspose.Words cannot find the fonts used in the document the fonts are substituted. This might lead into the layout difference, since substitution fonts might have different font metrics. You can implement IWarningCallback to get a notification when font substitution is performed.
Or the provided font does not have the glyphs used in your document, in this case Aspose.Words performs font fallback algorithm and tries to find the glyphs in the alternative font.

I’m sorry I missed this file. The following is my font folder. In my docx file, the text that cannot be displayed normally and the text that can be displayed normally are all in Song typeface (simsun. ttf)
fonts.zip (5.4 MB)

@humanhuman Thank you for additional information. The SimSun font does not contain the required glyphs. So if put only SimSun into the fonts folder and use the following simple code, the problem is reproducible:

doc.setFontSettings(new FontSettings());
doc.getFontSettings().setFontsSources(new FontSourceBase[]{new FolderFontSource("C:\\Temp\\fonts", true, 1) });
doc.save("C:\\Temp\\out.pdf");

However, if also use SystemFontSource:

doc.getFontSettings().setFontsSources(new FontSourceBase[]{new FolderFontSource("C:\\Temp\\fonts", true, 1), new SystemFontSource(2) });

on Windows, Aspose.Words uses Microsoft YaHei Regular and Microsoft JhengHei Regular fonts to render the missed glyphs. So if you put these fonts into the fonts folder, the document is rendered fine. By the way MS Word also uses the mentioned fonts when you convert document to PDF.

Could you please provide me with these two fonts? I can’t find them. Thank you

@humanhuman Here are the fonts required to convert your document to PDF:
https://drive.google.com/file/d/1z77nywsvdo5kp6af4tamwkmcpjdzrv5c/view?usp=sharing

I used the font and related code provided by you, but the converted pdf file still lacks this part of the word, which may be due to the system. I have operated on Windows and Linux, but it is still blank

I used adobe to open the pdf file you converted. I saw Microsoft YaHei Regular, but the font was not in the pdf file I converted

This is my code. I can confirm that the font path is correct

public static void main(String[] args) throws Exception{
    Document doc = new Document(new FileInputStream("C:\\Users\\Administrator\\Downloads\\docx.docx"));
    doc.setFontSettings(new FontSettings());
    doc.getFontSettings().setFontsSources(new FontSourceBase[]{new FolderFontSource("C:\\Users\\Administrator\\Desktop\\fonts", true, 1), new SystemFontSource(2) });
    doc.save("C:\\Users\\Administrator\\Downloads\\0.pdf");
}

@humanhuman Could you please attach your output PDF document here for our reference?

This is the pdf file I converted

0.pdf (139.4 KB)

@humanhuman It is odd. I cannot reproduce the problem on my side. I have used hte following simple code:

Document doc = new Document("C:\\Temp\\in.docx");

doc.setFontSettings(new FontSettings());
doc.getFontSettings().setFontsSources(new FontSourceBase[]{new FolderFontSource("C:\\Temp\\fonts", true, 1) });

doc.save("C:\\Temp\\out.pdf");

Where C:\Temp\fonts folder contains only the fonts I have shared earlier. The output document looks correct: out.pdf (137.3 KB)

I have noticed you are using 22.8 version of Aspose.Words. Could you please try using the latest 22.12 version on your side?

This problem can be solved by using 22.12. I should have tried the latest version long ago. I’m sorry that this simple problem has delayed your time. Thank you.

1 Like