I am running info a problem rendering a Korean word document to PDF.
The word document is being rendered on a Linux system that doesn’t have the original font. Our understanding is that the font will be substituted based on the rules outlined here: Using TrueType Fonts in Java|Aspose.Words for Java.
To understand the fallback settings we have generated the fallback settings XML like this:
var fontSettings = FontSettings()
fontSettings.fallbackSettings.buildAutomatic()
fontSettings.fallbackSettings.save("text.xml")
This shows that for the Korean (Hangul) charset range (U+1100-U+11FF) it should use the UnDotum type:
The problem is however that the Korean text fallback to the font AR PL UKai CN which do not support Korean characters, and not UnBatang or UnDotum or other proper Korean font.
If we print out the font substitution warnings we see:
Font Warning: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Why would Aspose.word select the AR PL UKai CN font instead of the font UnDotom as specific in the fallback settings?
Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:
Your input Word document.
Please share the fonts “Gulim” and “AR PL UKai CN”.
Please create a simple Java application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.
As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.
PS: To attach these resources, please zip and upload them.
@tahir.manzoor Zip uploaded with source replicating the problem.
To build run: ./gradlew shadowJar
To run: java -jar build/libs/AsposeTest-1.0-all.jar encoding.docx encoding.pdf
Outputs are encoding.pdf rendered pdf and fallback.xml the fallback settings from Aspose.words.
The program implements a IWarningCallback and will output to console when a font is being substituted.
The font AR PL UKai CN is provided in the fonts folder (comes from the package: Ubuntu – Error), the Gulim font is a standard Windows font and should not be relevant to reproducing the problem.
Also note you’ll also see a similar problem for: Vietnamese, Urdu and Hindi in the test file encoding.docx.
Unfortunately, we have not found the attachment with your post. Please attach it again.
Please do not include the JAR file in the ZIP file. You can share the Java code to reproduce this issue at our end. If the documents’ size is bigger, please ZIP and upload them on Dropbox or any other file hosting service and share the download link here for us to test this scenario.
Thanks for your inquiry. We logged this problem in our issue tracking system as WORDSNET-17813. You will be notified via this forum thread once this issue is resolved.
Thanks for logging the ticket. Can you help me understand the issue a little better? am I correct in expecting that the font from the fallback.xml file should have been used? or is there something else involved in picking the fallback font?
We logged this issue to check weather the behavior of Aspose.Words in your case is correct or not. Once there is any update available on this issue, we will inform you via this forum thread.
You are facing the expected behavior of Aspose.Words. The Font substitution and font fallback are different independent mechanisms. Font substitution is performed according to FontInfo from the document and font fallback settings are not considered at this step.
In your case ‘AR PL UKai CN’ font is selected as substitution. If substitution font do not contains specific characters, then font fallback is performed for these characters according to fallback table.
In your case ‘UnDotum’ font should be used as a fallback font for Korean characters and document should be rendered to PDF well.
As an alternative you could set up font substitutes from ‘Gulim’ to ‘UnDotum’ in FontSettings explicitly.
Thanks for the explanation that is very helpful in understanding the font substitution and fallback process. This is however not the behavior we are observing when executing our code.
We see the font substitution with the following message: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
But we never see the fallback font UnDotum being used instead the PDF is rendered with a font that doesn’t support Korean making the Korean text show up as squares. If you install the package fonts-arphic-ukai in your system you should be able to replicate the problem using our code.
Attached is the resulting rendering: encoding.pdf (448.1 KB) and the fallback.xml used for that rendering: fallback.xml.zip (1.7 KB).
Thanks for your inquiry. We will install the package fonts-arphic-ukai at our end and test this case. We will investigate the issue and share our findings with you soon.
I guess the next step is to figure out why this last fallback substitution is not happening in our system. Is there a callback similar to the IWarningCallback I can hook into to try an get some more information?
How can we debug the fallback substitution process?
Unfortunately, no callback is available for font fallback mechanism. However, we have logged this feature request as WORDSNET-17838 in our issue tracking system. You will be notified via this forum thread once this feature is available. We apologize for your inconvenience.
Could you please perform the following steps and share the PDF file and warning messages here for our reference? We will then provide you more information about your query.
Document doc = new Document(filename);
doc.setWarningCallback( new WarningCallback() );
FontSettings fontSettings = new FontSettings();
fontSettings.setFontsFolder("./fonts", true);
fontSettings.getFallbackSettings().buildAutomatic();
fontSettings.getFallbackSettings().save("fallback.xml");
doc.setFontSettings(fontSettings);
PdfSaveOptions options = new PdfSaveOptions();
doc.save(outputFilename, options);
The rendering now gives the following output:
$ java -jar build/libs/AsposeTest-1.0-all.jar encoding.docx encoding.pdf
Font sub: Font 'Calibri' has not been found. Using 'Noto Serif' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Cambria' has not been found. Using 'Noto Sans' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MingLiU' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MS Gothic' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'MS Mincho' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Raavi' has not been found. Using 'Noto Sans' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Angsana New' has not been found. Using 'Noto Sans Lao UI' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Latha' has not been found. Using 'Arial' font instead. Reason: closest match according to font info from the document.
Font sub: Font 'Mangal' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
But the font fallback now works for all language except Urdu (could be I have a wrong Urdu font).
I think I may have gotten a little closer to the problem.
I created a document with Korean only text and ran it through the original program:
Document doc = new Document(filename);
doc.setWarningCallback( new WarningCallback() );
FontSettings fontSettings = new FontSettings();
fontSettings.getFallbackSettings().buildAutomatic();
fontSettings.getFallbackSettings().save("fallback.xml");
doc.setFontSettings(fontSettings);
PdfSaveOptions options = new PdfSaveOptions();
doc.save(outputFilename, options);
This produces the following output:
$ java -jar build/libs/AsposeTest-1.0-all.jar korean-only.docx encoding.pdf
Font sub: Font substitutes: 'Calibri' replaced with 'Liberation Sans'.
Font sub: Font 'Gulim' has not been found. Using 'AR PL UKai CN' font instead. Reason: closest match according to font info from the document.
When I open the PDF none of the Korean characters are rendered correct and if I look at the font settings the following fonts are used in the PDF: Screen Shot 2018-11-30 at 2.51.00 PM.png (30.6 KB)
As you can see UKaiCN is not being replaced by UnDotum.