Write Unicode Text of any Language (Punjabi Gujarati) & UTF-8 Characters in DOCX & Convert to PDF using Java API

@srinivasc,

You need to specify a suitable Font name before writing Text in different languages. For example:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.write("Punjabi (India): ");

builder.getFont().setName("Verdana"); // You need to specify Punjabi Font name here
builder.write("ੂਪਾ ੌਹਗਮਕ ਵੀਦੈਲ ਿਦੰ ਰਹਸਜਾ੍ ਦਨਾੀ ੂਪਾ ਤੋਬ ੍ਦੁ");

builder.getFont().clearFormatting();
builder.writeln();
builder.write("Gujarati (India):");

builder.getFont().setName("Arial"); // You need to specify suitable Gujarati Font name here
builder.write(" ૂપા ૌહગમક વીદૈલ િદં રહસજા્ દનાી ૂપા તોબ ્દુ");

builder.getFont().clearFormatting();
builder.writeln();

doc.save("D:\\temp\\awjava-18.9.docx");

Hope, this helps.

Hi @awais.hafeez,

Thanks for your quick reply.

I already saw this solution. But in my case it won’t work out, Because I will fetch the bunch of data from the database that have data in different languages and sadly I don’t have any fonts related information in it.

So I have to identify font in each location of the data and set the the fonts accordingly, unfortunately I don’t see any solution to find out the font type in java.

Example Data:

the quick brown fox jumped over the lazy dog ૂપા ૌહગમક વીદૈલ િદં રહસજા્ દનાી ૂપા તોબ ્દુ 快速的棕色狐狸跳過懶惰的狗 тхе љуицк броњн фоџ јумпед овер тхе лаѕз дог فاث ضعهؤن لاقخصى بخء تعةحثي خرثق فاث مشئغ يخل տհէ խըիգկ բրուն ֆոց ճըմպէդ ովէր տհէ լազե դոք ੂਪਾ ੌਹਗਮਕ ਵੀਦੈਲ ਿਦੰ ਰਹਸਜਾ੍ ਦਨਾੀ ੂਪਾ ਤੋਬ ੍ਦੁ тхе љуицк броњн фоџ јумпед овер тхе лаѕз дог otğ frnvm çıhgz ahö krspğe hcğı otğ lujd ehü:)Ended here last…:joy::joy::joy::joy::joy::joy: 絵文字"

Thanks,
Srinivas

1 Like

@srinivasc,

I think, you can use Google’s translation APIs to detect Language of a given string and based on returned language specify correct Font name in Aspose.Words:
https://cloud.google.com/translate/docs/detecting-language

@awais.hafeez,

Thanks for your suggestion.

Is there any other way that without specifying font type each and every time when language change to this work?
Because in PDF this is working fine with the following setup.

ArrayList fontSources = new ArrayList(Arrays.asList(FontSettings.getDefaultInstance().getFontsSources()));
FolderFontSource folderFontSource = new FolderFontSource("C:/APA/docs/Aspose/Fonts", true);//Location where all the fonts are available
fontSources.add(folderFontSource);
// Convert the Arraylist of source back into a primitive array of FontSource objects.
FontSourceBase[] updatedFontSources = (FontSourceBase[]) fontSources.toArray(new FontSourceBase[fontSources.size()]);
// Apply the new set of font sources to use.
FontSettings.getDefaultInstance().setFontsSources(updatedFontSources);
doc.save("C:/APA/docs/Aspose/Unicode/Test.pdf",SaveFormat.PDF);

Thanks,
Srinivas

@srinivasc,

Please also provide Aspose.Words generated DOCX file containing square boxes and corresponding PDF file showing the desired output here for further testing. We will investigate the scenario further on our end and provide you more information.

@awais.hafeez,

Here I have attached the sample code I have used to generate word and pdf documents and also the documents generated using this code.unicode.zip (125.7 KB)

I could not attach the Fonts folder that I am referring in the code because it is around 500 MB. You can refer the fonts from your OS(C:\Windows\Fonts). If the specific language font available then the it is resolving in PDF.

Please let me know if any other details required.

Thanks,
Srinivas

@srinivasc,

I am afraid, there is no simple way to detect language from string (it is actually out of the scope of Aspose.Words). You can try using “Arial Unicode” font which contains almost all glyphs from different languages. We have installed this “Arial Unicode” font on our end. We do not see any square boxes in your shared “unicode.docx” document when opened with MS Word 2016. Please check this screenshot.

@awais.hafeez,

Thanks for all your support. Now I too able to get the all language related fonts with “Arial Unicode” in PDF document except emoji’s.
For the emoji’s I have tried with “Segoe UI Emoji” font and aspose-words-16.1.0-java and this didn’t work.

Please let me know the font type and the aspose version that I have to use to support emoji’s.

Thanks,
Srinivas

@srinivasc,

We suggest you please upgrade to the latest version of Aspose.Words for Java i.e. 18.9 and see how it goes on your end?

In case the problem still remains, please ZIP and upload your input Word document (you are getting this problem with) and Aspose.Words generated PDF file showing the undesired behavior here for testing. Please also provide a comparison screenshot highlighting the problematic emojis in Aspose.Words generated output with respect to your expected output and attach it here for our reference We will then investigate the issue further on our end and provide you more information.

With the aspose version 18.7, it worked for me. Thanks for all your support.

@srinivasc,

It is great that you were able to resolve this issue on your end. Please let us know any time you have any further queries.

I have problem with the below fonts in pdf.
Downloaded the below fonts and referring these true fonts while writing to PDF document, all these fonts are displaying as square boxes.
Please confirm which font I have to use to render this fonts in PDF document.

Segoe UI Symbol ----- ⏴⏵⏶⏷:pause_button::stop_button::record_button:
Malgun Gothic ----- ᇹᇺᇻᇼᇽᇾᇿ
Sylfaen -----ⴀ ⴁ ⴂ ⴃ ⴄ ⴅ ⴆ ⴇ ⴈ ⴉ ⴊ
Microsoft JhengHei ---- ㇐ ㇑ ㇒ ㇓
Microsoft JhengHei ------ ㄭ

Thanks,
Srinivas

@srinivasc,

Please ZIP and upload your input Word document (you are getting this problem with) and Aspose.Words generated PDF file showing the undesired behavior here for testing. We will investigate the issue on our end and provide you more information.

Please find the code that I have used to generate pdf document and sample PDF document that I have generated out of it. I have the below fonts in fonts folder. buildPDF.zip (38.0 KB)

ARIALUNI, msjh, msjhbd, msjhl, SEGOEUISL, seguiemj, seguisym and sylfaen.

Please let me know if any other details required.

@srinivasc,

Please simply install the following font files:

  • Malgun Gothic
  • Segoe UI Symbol
  • Microsoft JhengHei
  • Sylfaen

Hope, this helps.

I have installed this fonts already and referring this fonts while generating the PDF document, but still not working. The code and PDF document generated attached in my previous comment itself.

Please let me know if any other details required.

@srinivasc,

Please try using the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.getFont().setName("Malgun Gothic");
builder.write("Malgun Gothic: ᇹᇺᇻᇼᇽᇾᇿ");
builder.writeln();
builder.getFont().setName("Segoe UI Symbol");
builder.write("Segoe UI Symbol : ⏴⏵⏶⏷");
builder.writeln();
builder.getFont().setName("Microsoft JhengHei");
builder.write("Microsoft JhengHei : ㇐ ㇑ ㇒ ㇓");
builder.writeln();
builder.write("Microsoft : JhengHei ㄭ");
builder.writeln();
builder.getFont().setName("Sylfaen");
builder.write("Sylfaen  : ⴀ ⴁ ⴂ ⴃ ⴄ ⴅ ⴆ ⴇ ⴈ ⴉ ⴊ");
doc.save("D:\\temp\\awjava-18.10.docx");
doc.save("D:\\temp\\awjava-18.10.pdf");

Hi @awais.hafeez,

Thanks for providing the example code, by using this this fonts are coming in PDF.

But unfortunately for our needs this won’t help us. In our data we will not have any clue where which language font will come. So we have to work it out with only true type fonts.
FontSettings.getDefaultInstance().setFontsFolder(“C:/tmp/Fonts”, true);

we have “Arial Unicode MS” true type font available in the above specified Fonts folder which is supporting most of the font families like Chinese, Gujarathi, Punjabi and Hindi etc.

However we have issues with Malgun Gothic, Segoe UI Symbol, Microsoft JhengHei and Sylfaen. we downloaded corresponding font files and placed in the Fonts folder but it is not working.

Please confirm if it is achievable using True Type Fonts.
We are already having Aspose licence.

@srinivasc,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-17590. We will further look into the details of this requirement and will keep you updated on the status of the linked issue.

@srinivasc,

Regarding WORDSNET-17590, what we understand is that you take some Unicode text (which may be in any language), insert it into Aspose.Words’ DOM and then save to DOCX/PDF. If you insert all text with the single default font then this case is handled by Font Fallback mechanism. The text is stored in DOM and saved to DOCX (and other flow formats) as is. Font fallback is performed when opening it with application (MS Word or some other). When rendering to PDF (and other fixed-page formats), Aspose.Words performs font fallback by itself. So, generally you should not perform additional actions.

You also seem to complain that generated DOCX file is not opened properly. We assume that it is opened in non-MS Word app because MS Word 2016 handles all your documents well. If you cannot rely on DOCX opening application then alternative will be (as suggested here) use some third-party library to detect Unicode text language and set the font in DOM accordingly. Also as another alternative we could try to introduce new feature to change the fonts in DOM according to our font fallback rules.

As for the this specific issue with saving to PDF, we have updated our default fallback settings to fit the MS Word behavior. All text is rendered fine except the text for “Segoe UI Symbol” which is not rendered by MS Word 2016 either.

We will keep you posted on further updates and let you know when this issue will be resolved.