Support Cambodian language

Hi Aspose team,

I’m using Aspose.Words: 22.9 version for Java.
It worked like a charm Word generation and Word to Pdf conversion, even though our documents are quite big and complex. So using Aspose save a lot of time of our development team.

At some point our documents need to be generated for non-Latin languages. I’ve made some tests and is working for Chinese and Russian languages for example, which is really good.

But there are some particular documents that contains some parts of the text in Cambodian language. When Word document is generated, those characters are replaced by utility displays box type characters like in this topic: Support Korean characters
If I’m changing the font style manually in Word for those utility display boxes(almost in any styles), the initial Cambodian text is appearing…This is quite strange, somehow Word managed to adapt…

Could you please, point me into the right direction ? Is there some configuration that I need to make in the Java code ?

I can give you an example of Cambodian text: ម្រេចកំពត ,maybe you can make some tests on your side as well.

Thanks

@brebDev Could you please attach your test document with Cambodian content here for testing? So we work with the same documents. We will check the issue and provide you more information.

Hi Alexey,

Yes of course. In order to be much simpler, I’ve used a dummy template that generates a much simpler document.
Attached you’ll find the template and document generated. In the generated document you will see a string that contains: English/Russian/Chinese characters which are rendered correctly and also Cambodian characters which are replaced by those utility boxes. generatedDocument.docx (99.8 KB)
template.docx (99.2 KB)

Thanks

@brebDev Most likely the problem on your side occurs because fonts that contain Cambodian characters are not available in the environment where conversion is performed. On my side the document is converted properly: out.pdf (30.7 KB)
I Have used the following code:

Document doc = new Document("C:\\Temp\\in.docx");
doc.getLayoutOptions().setTextShaperFactory(com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory.getInstance());
doc.save("C:\\Temp\\out.pdf");

I also use HarfBuzzTextShaperFactory to improve rendering of Cambodian characters.
https://docs.aspose.com/words/net/enable-opentype-features/

In your document DaunPenh font is used for Cambodian text, make sure this font is available in the environment where the conversion is performed. If font is missed Aspose.Words substitutes the font and the substitution font might not contain the required glyphs, in this case Aspose.Words performs font fallback mechanism, that tries to find the required glyphs in the available fonts according to fallback rules.

When I’m trying to alter manually the style in generatedDocument.docx, the Camnbodian text displays, but Word automatically convert from Arial the font to Leelawadee UI and the rest remain to Arial.

I’ve seen that the Cambodian language(Khmer) is not supported anymore by default in Windows 10: Redirecting . I’ve tried to installed it, and by default the DaunPenh font was added in Mirosoft Word. Right now the generatedDocument.docx looks good and those characters ar rendered by using DaunPenh font. By the way I’m using Windows 10 Enterprise.

Is there a way to detect if the string contain Cambodian characters and install it programmatically using Aspose ? Or should we tell our clients to install Khmer language on their Windows machines ?
Actually at this point I’m thinking if there could be any other languages/fonts that Windows machines are not supporting by the fault, and our documents would contain those in the future…

What is your recommendation ? Could these cases be treated from the code(and somehow missing fonts being installed) ? or this should be handled manually by each client individually ?

@brebDev Using Aspose.Words you can detect the fonts which are used in the document but are missed in your environment. You can use IWarningCallback to achieve this.
Unfortunately, there is no way to install the required fonts using Aspose.Words. It is required that the required fonts are installed on the client’s side manually. Or as an option you can provide set of required fonts with your application and configure Aspose.Words to use both system fonts and the fonts provided with the application.

Ok, the solution of installing on client’s machines these missing fonts make sense.

I’ve tried the second suggestion, in order to config Aspose.Words to use fonts provided by the application and the ones from the system. This is how the code looks:

// Configure fonts

FontSettings fs = FontSettings.getDefaultInstance();
fs.setFontsSources(
        new FontSourceBase[] {
    loadFont("/aspose/arial.ttf"),
    loadFont("/aspose/ariali.ttf"),
    loadFont("/aspose/daunpenh.ttf"),
    new SystemFontSource()
        }
);
LoadOptions lo = new LoadOptions();
lo.setFontSettings(fs);

Document doc = new Document(is, lo);
doc.setWarningCallback(warningInfo-> {
    if (warningInfo.getWarningType() == WarningType.FONT_SUBSTITUTION)
    {
        log.info("{}", warningInfo.getDescription());
    }
});

ReportingEngine re = new ReportingEngine();

Sender sender = new Sender();
sender.setName("TEST FR AO  контрольная работа ម្រេចកំពត  测验,考");

re.buildReport(doc, sender, "s");
doc.save(out, SaveOptions.createSaveOptions(SaveFormat.DOCX));

Unfortunately is still not working…The only difference that I’ve seen in the generatedDocument.docx, is that the font for Cambodian characters is indeed DaunPenh(In Word), but DaunPenh does not exist in Word’s list of fonts. Did I missed something ?

What I don’t get from your last comment is that if Aspose is configured to load the document with a font from the application resources, that is not present on the system, would Word be able to pick up the font configure just in Aspose ? Or is mandatory to have that DaunPenh font on the system ?

@brebDev In the list of fonts MS Word shows the fonts, which are availabe in your system. If DaunPenh is not installed in your system, this fonts will not be shown in the list. But Aspose.Words uses this fonts as a default font for Cambodian characters. For example if in your template you have the following:

<<[value]>>

and use this code to fill the template with data:

Document doc = new Document("C:\\Temp\\in.docx");

ReportingEngine engine = new ReportingEngine();
engine.buildReport(doc, "TEST FR AO  контрольная работа ម្រេចកំពត  测验,考", "value");

doc.save("C:\\Temp\\out.docx");

In the output document you can see that Cambodian text is put into a separate Run and it’s font is DaunPenh:

<w:r>
	<w:t xml:space="preserve">TEST FR AO  контрольная работа </w:t>
</w:r>
<w:r>
	<w:rPr>
		<w:rFonts w:ascii="DaunPenh" w:eastAsia="DaunPenh" w:hAnsi="DaunPenh" w:cs="DaunPenh" />
	</w:rPr>
	<w:t>ម្រេចកំពត</w:t>
</w:r>
<w:r>
	<w:t xml:space="preserve">  测验,考</w:t>
</w:r>

The font specified in the document can be not available in the environment where the document is consumed. In this case consumer application (MS Word for example) uses the same approach as Aspose.Words does upon document rendering, i.e. substitutes the missed fonts.

However, if you provide the fonts in the document font setting and would like the document to look the same in any environment, you can embed the fonts into the document. You can use FontInfoCollection.EmbedTrueTypeFonts property to achieve this.

Document doc = new Document("C:\\Temp\\in.docx");

ReportingEngine engine = new ReportingEngine();
engine.buildReport(doc, "TEST FR AO  контрольная работа ម្រេចកំពត  测验,考", "value");

doc.getFontInfos().setEmbedTrueTypeFonts(true);
doc.getFontInfos().setEmbedSystemFonts(true);
// Set this to false if it is supposed to edit the document.
// If not set to false, only glyphs used in the document will be embedded.
doc.getFontInfos().setSaveSubsetFonts(false);

doc.save("C:\\Temp\\out.docx");

While waiting for your reply, I’ve managed to generate a document correctly. The only prerequisite that I’ve made, was in the template to select the option: to embed the fonts. Right now I can see that this can be done from the code.

Either manual OR from the code(if embedTrueTypeFonts is true, as you’ve described), the generated document displays the Cambodian characters. Nice !

So the Cambodian characters are displayed now, but the size is really small compared to the other fonts. I’ll attach the generated document here. generatedDoc_WithAsposeConfig.docx (104.0 KB)

Also I’ve tried to convert this Word doc to pdf. I’ll attach the pdf as welldocument.pdf (29.2 KB)

Both in Word and Pdf the Cambodian text looks a bit different compared to initial one that is passed from the application: ម្រេចកំពត .Is there another encoding issue here ??

@brebDev Characters appearance depends on the font used. Also, in this case open type features must be used to properly render Cambodian glyphs:

Document doc = new Document("C:\\Temp\\in.docx");
doc.getLayoutOptions().setTextShaperFactory(com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory.getInstance());
doc.save("C:\\Temp\\out.pdf");

Here how Cambodian glyphs are rendered without open type features enabled:


and here is how they are rendered if enable open type features:

Hi @alexey.noskov . Ok it finally worked. I think the size of the characters rendered is just a matter of how this DaunPenh is defined. Thanks a lot for your responses !

Since this POC is working, we’re thinking to move old documents that were rendered with Freemarker and some other library -> to Aspose generation. From your documentation I’ve seen that Aspose is working with LINQ. Do you know if there is a chance to generate Aspose with Freemarker template engine ?

If you want I can create another topic, since this question is out of initial topic’s context.

@brebDev It is perfect that you managed to make it work on your side.

I have moved your second question regarding Freemarket template into a separate thread:
https://forum.aspose.com/t/generate-report-with-freemarker-template/252165

1 Like