Font fallback for Korean picking incorrect font

tahir.manzoor · December 3, 2018, 5:44am

Thanks for your patience. We have tested the scenario and noticed that you are facing the expected behavior of Aspose.Words.

For korean-only.docx, If the font ‘UnDotum’ does not exist in font’s folder, the ‘AR PL UKai CN’ is used for rendering.

For Urdu font issue, please use the correct Urdu font to get the desired output.

sbd · December 3, 2018, 6:17pm

@tahir.manzoor I think you misunderstood my post, for korean-only.docx, the font ‘UnDotum’ does exist in the system font’s folder, but ‘AR PL UKai CN’ is still used for rendering.

Running:

Document doc = new Document(filename);
doc.setWarningCallback( new WarningCallback() );

FontSettings fontSettings = new FontSettings();
fontSettings.getFallbackSettings().buildAutomatic();
fontSettings.getFallbackSettings().save("fallback.xml");

doc.setFontSettings(fontSettings);

PdfSaveOptions options = new PdfSaveOptions();
doc.save(outputFilename, options);

This produces the following output:

$ java -jar build/libs/AsposeTest-1.0-all.jar korean-only.docx encoding.pdf
Font sub: Font substitutes: 'Calibri' replaced with 'Liberation Sans'.
Font sub: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead.

And produces the following PDF: https://forum.aspose.com/uploads/default/21357

tahir.manzoor · December 4, 2018, 4:50am

@sbd

Thanks for sharing the detail. We logged this problem in our issue tracking system as WORDSNET-17852 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

tahir.manzoor · December 7, 2018, 9:56am

@sbd

Thanks for your patience. The Korean character in the document are in “U+AC00…U+D7AF Hangul Syllables” Unicode block. “utkal” font is used for this range in the fallback.xml provided by you.

Probably this font does not contains required glyphs and fallback fails. FallbackSettings.BuildAutomatic() may not produce the optimal fallback settings as described in xml comments. You should set the “UnDotum” fallback font for “U+AC00…U+D7AF Hangul Syllables” range manually. Please generate the fallback.xml using BuildAutomatic method, modify it manually, and load it with FontFallbackSettings.Load() method.

sbd · December 7, 2018, 10:04pm

Thanks for sharing this @tahir.manzoor using this method I was able to make the document render correctly for Korean.

We just have one last problem which is rendering Urdu. When rendering this file urdu.docx.zip (8.7 KB) we get the following PDF: urdu.pdf (73.9 KB).

We use the following modified fallback.xml file: fallback.modified.xml.zip (1.3 KB).

This file specifies that for the Urdu ranges:

[U+0600 to U+06FF]
[U+0750 to U+077F]
[U+FB50 to U+FDFF]
[U+FE70 to U+FEFF]

To use the font called Nice however if you look at the fonts in the PDF it is not included. Instead it seems like it is using Takao P Gothic.

Can you help us understand why Nice is not used and instead a font not supporting the character set is used?

tahir.manzoor · December 8, 2018, 7:11am

@sbd

Thanks for your inquiry. We are investigating this issue and will get back to you soon.

tahir.manzoor · December 10, 2018, 4:51am

@sbd

We have tested the scenario using the latest version of Aspose.Words for Java 18.12 and unable to reproduce the same issue at our end. Could you please ZIP and attach ‘Nice’ and ‘Takao P Gothic’ fonts that you are using here for testing? Thanks for your cooperation.

sbd · December 10, 2018, 3:01pm

Absolutely please find fonts here: https://drive.google.com/open?id=1iulhlMghhwLQjoVBvOxg3f9yJWx9ufgl

tahir.manzoor · December 10, 2018, 5:47pm

@sbd

Unfortunately, we are unable to download the fonts. Please give the access to download them. Thanks for your cooperation.

tahir.manzoor · December 11, 2018, 5:23am

@sbd

Thanks for sharing the detail. We logged this problem in our issue tracking system as WORDSNET-17904 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

tahir.manzoor · December 26, 2018, 7:39am

@sbd

We have completed the analysis of Urdu font issue. In your document Urdu text contains chars from “U+FB50…U+FDFF Arabic Presentation Forms-A” and “U+FE70…U+FEFF Arabic Presentation Forms-B” Unicode blocks. According to provided PDF output, “Arial” font is used for the chars from “U+FE70…U+FEFF Arabic Presentation Forms-B” block (which is directly specified for the text) and “DejaVu Sans” font is used for the chars from “U+FB50…U+FDFF Arabic Presentation Forms-A” (which is specified as a last fallback font in provided fallback settings XML). “Nice” font is probably not used because it contains only glyphs for “U+FE70…U+FEFF” block and they are already resolved in “Arial” font.

None of the provided fonts contains all required glyphs from “U+FB50…U+FDFF Arabic Presentation Forms-A”. So you should use some other font for this block. There is free “Noto Sans Arabic” font (Noto Home - Google Fonts) which you could use.