Font fallback for Korean picking incorrect font

sbd · November 20, 2018, 4:04pm

I am running info a problem rendering a Korean word document to PDF.

The word document is being rendered on a Linux system that doesn’t have the original font. Our understanding is that the font will be substituted based on the rules outlined here: Using TrueType Fonts in Java|Aspose.Words for Java.

To understand the fallback settings we have generated the fallback settings XML like this:

var fontSettings = FontSettings()  
fontSettings.fallbackSettings.buildAutomatic()  
fontSettings.fallbackSettings.save("text.xml")

This shows that for the Korean (Hangul) charset range (U+1100-U+11FF) it should use the UnDotum type:

		<Rule Ranges="1100-11FF" FallbackFonts="UnDotum" />

The problem is however that the Korean text fallback to the font AR PL UKai CN which do not support Korean characters, and not UnBatang or UnDotum or other proper Korean font.

If we print out the font substitution warnings we see:

Font Warning: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.

Why would Aspose.word select the AR PL UKai CN font instead of the font UnDotom as specific in the fallback settings?

Let me know if you need test documents and code.

tahir.manzoor · November 20, 2018, 6:52pm

@sbd

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

Your input Word document.
Please share the fonts “Gulim” and “AR PL UKai CN”.
Please create a simple Java application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

sbd · November 20, 2018, 8:15pm

@tahir.manzoor Zip uploaded with source replicating the problem.

To build run: ./gradlew shadowJar

To run: java -jar build/libs/AsposeTest-1.0-all.jar encoding.docx encoding.pdf

Outputs are encoding.pdf rendered pdf and fallback.xml the fallback settings from Aspose.words.

The program implements a IWarningCallback and will output to console when a font is being substituted.

The font AR PL UKai CN is provided in the fonts folder (comes from the package: Ubuntu – Error), the Gulim font is a standard Windows font and should not be relevant to reproducing the problem.

Also note you’ll also see a similar problem for: Vietnamese, Urdu and Hindi in the test file encoding.docx.

tahir.manzoor · November 21, 2018, 6:10am

@sbd

Unfortunately, we have not found the attachment with your post. Please attach it again.

Please do not include the JAR file in the ZIP file. You can share the Java code to reproduce this issue at our end. If the documents’ size is bigger, please ZIP and upload them on Dropbox or any other file hosting service and share the download link here for us to test this scenario.

Thanks for your cooperation.

sbd · November 23, 2018, 4:57pm

@tahir.manzoor my apologizes I didn’t realize the file didn’t finish uploading.

I’ve removed the build directory and it seems to finish uploading now: AsposeTest.zip (10.0 MB).

Let me know if you have any troubles getting to the file and I can share on dropbox.

tahir.manzoor · November 24, 2018, 3:37am

@sbd

Thanks for sharing the detail. We are investigating this issue and will get back to you soon.

tahir.manzoor · November 24, 2018, 6:36am

@sbd

Have you tried the latest version of Aspose.Words for Java 18.11?

We have faced following warning message at our end. Please check the attached font fallback XML file. fallback.zip (1.3 KB)

Font ‘Gulim’ has not been found. Using ‘Albany WT J’ font instead. Reason: closest match according to font info from the document.

sbd · November 26, 2018, 2:23pm

@tahir.manzoor

Yes the project attached should be using Aspose.Words for Java 18.11.

The following is in the Gradle file:

dependencies {
    compile 'com.aspose:aspose-words:18.11:jdk16'
}

Can you provide some information about how this font is selected? Looking at your fallback table for what I understand is the Korean range it says:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<FontFallbackSettings xmlns="Aspose.Words">
  <FallbackTable>
    ....
    <Rule Ranges="1100-11FF" FallbackFonts="Malgun Gothic" />

Shouldn’t it have selected Malgun Gothic then?

tahir.manzoor · November 26, 2018, 3:56pm

@sbd

Thanks for your inquiry. We logged this problem in our issue tracking system as WORDSNET-17813. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

sbd · November 26, 2018, 4:30pm

@tahir.manzoor

Thanks for logging the ticket. Can you help me understand the issue a little better? am I correct in expecting that the font from the fallback.xml file should have been used? or is there something else involved in picking the fallback font?

Also is there any possible workarounds?

tahir.manzoor · November 26, 2018, 4:47pm

@sbd,

Thanks for your inquiry. The font fallback mechanisms is well explained in the following article.
Font substitution and Font fallback in Aspose.Words

We logged this issue to check weather the behavior of Aspose.Words in your case is correct or not. Once there is any update available on this issue, we will inform you via this forum thread.

tahir.manzoor · November 28, 2018, 9:46am

@sbd,

You are facing the expected behavior of Aspose.Words. The Font substitution and font fallback are different independent mechanisms. Font substitution is performed according to FontInfo from the document and font fallback settings are not considered at this step.

In your case ‘AR PL UKai CN’ font is selected as substitution. If substitution font do not contains specific characters, then font fallback is performed for these characters according to fallback table.

In your case ‘UnDotum’ font should be used as a fallback font for Korean characters and document should be rendered to PDF well.

As an alternative you could set up font substitutes from ‘Gulim’ to ‘UnDotum’ in FontSettings explicitly.

sbd · November 28, 2018, 2:09pm

@tahir.manzoor,

Thanks for the explanation that is very helpful in understanding the font substitution and fallback process. This is however not the behavior we are observing when executing our code.

We see the font substitution with the following message:
Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.

But we never see the fallback font UnDotum being used instead the PDF is rendered with a font that doesn’t support Korean making the Korean text show up as squares. If you install the package fonts-arphic-ukai in your system you should be able to replicate the problem using our code.

Attached is the resulting rendering: encoding.pdf (448.1 KB) and the fallback.xml used for that rendering: fallback.xml.zip (1.7 KB).

tahir.manzoor · November 28, 2018, 5:50pm

@sbd

Thanks for your inquiry. We will install the package fonts-arphic-ukai at our end and test this case. We will investigate the issue and share our findings with you soon.

tahir.manzoor · November 30, 2018, 6:06am

@sbd

We have managed to reproduce the same warning message and fallback.xml at our end.

The Korean characters are rendered with UnDotum’ font. Please check the attached documents. Docs.zip (485.0 KB)

sbd · November 30, 2018, 3:18pm

Thanks for the update this is very interesting.

I guess the next step is to figure out why this last fallback substitution is not happening in our system. Is there a callback similar to the IWarningCallback I can hook into to try an get some more information?
How can we debug the fallback substitution process?

tahir.manzoor · November 30, 2018, 5:46pm

@sbd

Thanks for your inquiry.

Unfortunately, no callback is available for font fallback mechanism. However, we have logged this feature request as WORDSNET-17838 in our issue tracking system. You will be notified via this forum thread once this feature is available. We apologize for your inconvenience.

Could you please perform the following steps and share the PDF file and warning messages here for our reference? We will then provide you more information about your query.

Please copy the fonts from Windows machine to Ubuntu.
Use FontSettings.SetFontsFolder method to set the folder where Aspose.Words looks for TrueType fonts.
Please copy ‘UnDotum’ and ‘AR PL UKai CN’ fonts into the font’s folder and remove the font ‘Liberation Sans’ if it exists.
Execute your code and generate the PDF.

sbd · November 30, 2018, 7:29pm

I’ve created a fonts directory and added the following fonts:

$ ls fonts/
Andale_Mono.ttf                   NotoSansLaoUI-Bold.ttf
andalemo.ttf                      NotoSansLaoUI-Regular.ttf
arialbd.ttf                       NotoSans-Regular.ttf
arialbi.ttf                       NotoSansTamil-Bold.ttf
Arial_Black.ttf                   NotoSansTamil-Regular.ttf
Arial_Bold_Italic.ttf             NotoSansTamilUI-Bold.ttf
Arial_Bold.ttf                    NotoSansTamilUI-Regular.ttf
Arial_Italic.ttf                  NotoSansThai-Bold.ttf
ariali.ttf                        NotoSansThai-Regular.ttf
arial.ttf                         NotoSansThaiUI-Bold.ttf
ariblk.ttf                        NotoSansThaiUI-Regular.ttf
AR PL UKai CN, Regular.ttc        NotoSansUI-BoldItalic.ttf
comicbd.ttf                       NotoSansUI-Bold.ttf
Comic_Sans_MS_Bold.ttf            NotoSansUI-Italic.ttf
Comic_Sans_MS.ttf                 NotoSansUI-Regular.ttf
comic.ttf                         NotoSerifArmenian-Bold.ttf
courbd.ttf                        NotoSerifArmenian-Regular.ttf
courbi.ttf                        NotoSerif-BoldItalic.ttf
Courier_New_Bold_Italic.ttf       NotoSerif-Bold.ttf
Courier_New_Bold.ttf              NotoSerifGeorgian-Bold.ttf
Courier_New_Italic.ttf            NotoSerifGeorgian-Regular.ttf
Courier_New.ttf                   NotoSerif-Italic.ttf
couri.ttf                         NotoSerifLao-Bold.ttf
cour.ttf                          NotoSerifLao-Regular.ttf
Georgia_Bold_Italic.ttf           NotoSerif-Regular.ttf
Georgia_Bold.ttf                  NotoSerifThai-Bold.ttf
georgiab.ttf                      NotoSerifThai-Regular.ttf
Georgia_Italic.ttf                Saab.ttf
georgiai.ttf                      timesbd.ttf
georgia.ttf                       timesbi.ttf
georgiaz.ttf                      timesi.ttf
impact.ttf                        Times_New_Roman_Bold_Italic.ttf
NafeesNastaleeq.ttf               Times_New_Roman_Bold.ttf
NafeesWeb.ttf                     Times_New_Roman_Italic.ttf
NotoSansArmenian-Bold.ttf         Times_New_Roman.ttf
NotoSansArmenian-Regular.ttf      times.ttf
NotoSans-BoldItalic.ttf           trebucbd.ttf
NotoSans-Bold.ttf                 trebucbi.ttf
NotoSansDevanagari-Bold.ttf       Trebuchet_MS_Bold_Italic.ttf
NotoSansDevanagari-Regular.ttf    Trebuchet_MS_Bold.ttf
NotoSansDevanagariUI-Bold.ttf     Trebuchet_MS_Italic.ttf
NotoSansDevanagariUI-Regular.ttf  Trebuchet_MS.ttf
NotoSansEthiopic-Bold.ttf         trebucit.ttf
NotoSansEthiopic-Regular.ttf      trebuc.ttf
NotoSansGeorgian-Bold.ttf         UnDotumBold.ttf
NotoSansGeorgian-Regular.ttf      UnDotum.ttf
NotoSansHebrew-Bold.ttf           Verdana_Bold_Italic.ttf
NotoSansHebrew-Regular.ttf        Verdana_Bold.ttf
NotoSans-Italic.ttf               verdanab.ttf
NotoSansKhmer-Bold.ttf            Verdana_Italic.ttf
NotoSansKhmer-Regular.ttf         verdanai.ttf
NotoSansKhmerUI-Bold.ttf          verdana.ttf
NotoSansKhmerUI-Regular.ttf       verdanaz.ttf
NotoSansLao-Bold.ttf              webdings.ttf
NotoSansLao-Regular.ttf

I modified the code so it looks like this:

Document doc = new Document(filename);
doc.setWarningCallback( new WarningCallback() );

FontSettings fontSettings = new FontSettings();
fontSettings.setFontsFolder("./fonts", true);
fontSettings.getFallbackSettings().buildAutomatic();
fontSettings.getFallbackSettings().save("fallback.xml");
doc.setFontSettings(fontSettings);
PdfSaveOptions options = new PdfSaveOptions();
doc.save(outputFilename, options);

The rendering now gives the following output:

$ java -jar build/libs/AsposeTest-1.0-all.jar encoding.docx encoding.pdf
Font sub: Font 'Calibri' has not been found. Using 'Noto Serif' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Cambria' has not been found. Using 'Noto Sans' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MingLiU' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MS Gothic' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MS Mincho' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Raavi' has not been found. Using 'Noto Sans' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Angsana New' has not been found. Using 'Noto Sans Lao UI' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Latha' has not been found. Using 'Arial' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Mangal' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.

But the font fallback now works for all language except Urdu (could be I have a wrong Urdu font).

See output fallback.xml.zip (1.0 KB) and encoding.pdf (438.2 KB).

Does this brings us closer to understanding why it doesn’t work when I use the system font directories?

sbd · November 30, 2018, 8:00pm

I think I may have gotten a little closer to the problem.
I created a document with Korean only text and ran it through the original program:

Document doc = new Document(filename);
doc.setWarningCallback( new WarningCallback() );

FontSettings fontSettings = new FontSettings();
fontSettings.getFallbackSettings().buildAutomatic();
fontSettings.getFallbackSettings().save("fallback.xml");

doc.setFontSettings(fontSettings);

PdfSaveOptions options = new PdfSaveOptions();
doc.save(outputFilename, options);

This produces the following output:

$ java -jar build/libs/AsposeTest-1.0-all.jar korean-only.docx encoding.pdf
Font sub: Font substitutes: 'Calibri' replaced with 'Liberation Sans'.
Font sub: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.

When I open the PDF none of the Korean characters are rendered correct and if I look at the font settings the following fonts are used in the PDF:
Screen Shot 2018-11-30 at 2.51.00 PM.png (30.6 KB)

As you can see UKaiCN is not being replaced by UnDotum.

Docx: korean-only.docx.zip (13.7 KB), PDF: encoding.pdf (36.0 KB), Fallback.xml: fallback.xml.zip (1.3 KB).

If I run the same files against the new code where we set:

fontSettings.setFontsFolder("./fonts/", true);

It does however work… I see UKaiCN being replaced by UnDotum.

Maybe try to install Liberation fonts in your system and see if you can reproduce?

apt install fonts-liberation

tahir.manzoor · December 1, 2018, 7:14am

@sbd

Thanks for sharing the detail. We will test the shared scenarios and share our finding with you. Please spare us some time for the investigation.