Render Indic Character (Hindi, Tamil or Malayalam) Text in Word Document to PDF using C# or Java | Advanced Typography HarfBuzz


I am evaluating aspose word for converting doc files to pdf.

My doc file has indic characters (hindi/malayalam - indian languages characters). the doc file is attached.

The doc file is created using open office and I am able to convert this to pdf using open office.

However, if I convert the doc file to PDF using aspose, the indic characters are not shown properly.

Does aspose word support indian languages?

Also attaching the converted PDF using aspose…

Thanks in advance for any help.

Hi Kishore,


Thanks for your inquiry.

Please check: How Aspose.Words Uses True Type Fonts

Please zip and attach the following two font files here for testing:

  • Lohit Marathi
  • Liberation Serif

Best regards,

Hi,

I am able to get the converted pdf to display malayalam (indian language) characters.

But, Ligature substitution (Indic rendering) is not happening and hence the characters are simply displayed in the order they occur.

Do you support Ligature substitution for unicode fonts?

Thanks,
Kishore

Hi Kishore,


Thanks for your inquiry. Could you please attach 1) your input Word document, 2) output PDF file showing the undesired behavior and 3) related Font files here for testing? We will investigate the issue on our end and provide you more information.

Best regards,

Hi,
I have added the following files

1. The original doc file - mal.doc
(you shoud be able to view this file using ArialUnicodeMS font)

2. The pdf created using openoffice

3. Pdf created using aspose

4. The font file I used to convert using aspose -

If you compare between the openoffice and asponse pdf you can see that ligature substitution is not happening.

Kishore

Hi Kishore,


Your document “mal.doc” uses following two fonts so please zip and attach these two font files here for testing:

  • Lohit Marathi
  • Liberation Serif

PS: You had attached “AnjaliOldLipi.ttf” which is incorrect.

Best regards,

Hi,

The language I used is Malayalam.

So, I am attaching the following fonts - Lohit Marati, Lohit Malayalam and Liberation Serif.

Thanks,
Kishore

Hi Kishore,


Thanks for your inquiry. After an initial test with Aspose.Words for Java 16.3.0, I was unable to reproduce this issue on my side (please see attached PDF). I would suggest you please upgrade to the latest version of Aspose.Words. You can download it from the following link. I hope, this helps.

Best regards,

Hi,

Here ligature substitution is not happening. From your pdf file

ക് ക -> ക്ക (here the 3 characters is substituted with this char automatically)

<style type="text/css">p { margin-bottom: 0.25cm; direction: ltr; color: rgb(0, 0, 0); line-height: 120%; }p.western { font-family: "Liberation Serif","Times New Roman",serif; font-size: 12pt; }p.cjk { font-family: "Droid Sans Fallback"; font-size: 12pt; }p.ctl { font-family: "Lohit Marathi"; font-size: 12pt; }</style>

വേക is shown wrong as വ േക


Please look at the original word doc - there is difference


Thanks,

Kishore

Hi Kishore,


Thanks for your inquiry. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-13503. Our product team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

A post was split to a new topic: Indic Aspose.Words PDF rendering

@kishorekollam,

Regarding WORDSNET-13503, it is to update you that the fix of this issue will be included in the 20.8 (next version) of Aspose.Words. We will inform you via this thread as soon as the next version containing the fix of this issue will be released at the start of next month.

After that you need to run the following code to get the desired output:

Document doc = new Document("E:\\Temp\\mal.doc");
doc.getLayoutOptions().setTextShaperFactory(com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory.getInstance());
doc.save("E:\\Temp\\mal.AW.20.7.HarfBuzz.pdf");

Please also refer to the following page:

The PDF file generated using the above code (after applying the fix on our end) is also attached here for your reference:

The issues you have found earlier (filed as WORDSNET-13503) have been fixed in this Aspose.Words for .NET 20.8 update and this Aspose.Words for Java 20.8 update.