Aspose words support for 'Language tags'

Hello team,

We use the ‘Aspose words for Java’ to generate the word document and convert the word to the pdf version of the document. For our pdf document to be ‘accessible’, we are being asked to set the language tags correctly. In order to achieve that, we tried setting the "proofing language’ to respective language in the word doc (option Review->Language- > Set proofing language) and tried generating the pdf version of the document. For some of the languages, even though the proofing language is set, Aspose could not generate the tags with respective language.

Question 1: Is there any limitation in aspose today with respect to number of languages it can support even if we can tag them correctly in word document using proofing language.

Question 2: Do you have a list of languages that aspose can support today for the pdf document to be accessible if word can support proofing language? For example, I have uploaded a document that has language set up as Hindi in proofing language. But, the output pdf not showing the language tag in Object properties.

Question 3: Words (hindi word- plain text used from google translator) have changed little bit to what was uploaded in word and what came through in PDF.

Language tag issue.docx (1.5 MB)
Input Language.docx (40.1 KB)
Pdf output.pdf (235.5 KB)

@cvsformulary Question 1, Question 2: Aspose.Words do not have limitation about language support. It supports all languages which can be used in Microsoft Word. The absence of the Hindi language tag in PDF output is a bug in Aspose.Words. I have logged your issue as WORDSNET-23493 in our defect tracking system. We will keep you informed and let you know once it is resolved. But for other languages like Amharic I do not see it in the MW GUI. English language is displayed instead.

@cvsformulary Question 3: For proper Hindi text rendering you should use HarfBuzzTextShaperFactory. Please check Enable OpenType Features article.

Do you have a rough timeframe when this bug will be fixed?
For all the other languages, we did not not tag them with the right proofing language in our input document. If it is tagged with right language, your statement is that, aspose.word should handle it. is that correct understanding? we will try to test it out, but, if it does not work, we report the issue again to you so you can fix the bug?

@cvsformulary Unfortunately there are no estimate for WORDSNET-23493 at the moment.
Yes, you understand correctly. Aspose.Words should handle other language tags as well. Please feel free to report if there will be other issues with the language export to the PDF.

Attached couple of documents. We already discussed about Hindi being a bug as aspose could not tag. We tested for more languages today and found Gujarati, Punjabi and Thai languages are not supported as well. I am looking for a definitive language list. Can you please provide the list?

Language compilation for Aspose test.docx (72.5 KB)
Language compilation for Aspose test.pdf (482.0 KB)

Secondly, rendering the language characters is another issue. How do we ensure aspose can convert the language as it is uploaded in word document. As any individual cannot be an expert of all languages, this is another reason why we request Aspose to provide the definitive language list that you support.

@cvsformulary I’ve updated the WORDSNET-23493 with the cases from your document with Khmer, Gujarati, Punjabi and Thai text. Unfortunately we cannot provide the definitive language list for which the described problem with PDF language tag export exists.

Can you provide an update on the BUG WORDSNET-23493 that was opened? it has been more than a year since it was opened.

@cvsformulary Unfortunately, implementation of this feature has been postponed and is not yet scheduled for development. Please accept our apologies for your inconvenience.

Can you provide an update on the BUG WORDSNET-23493 that was opened?

@cvsformulary Unfortunately, there are no news regarding the issue yet. The implementation of this feature is still postponed and is not yet scheduled for development. Please accept our apologies for your inconvenience.

Apart from those languages mentioned in above thread (Hindi, Khmer, Gujarati, Punjabi, Thai) that are NOT tagging correctly, we also noticed few more languages not tagging (like Bassa, Cambodian, Haitian Creole, Hmong, Tagalog). Please get the ticket updated and How do we get WORDSNET-23493 escalated so it is fixed? CVS Health company uses a licensed product and we are looking for documents to be 508 compliant. @Konstantin.Kornilov @alexey.noskov

@cvsformulary Thank you for additional information. I have update the defect. Unfortunately, the issue has been postponed and is not yet scheduled for development yet. We will keep you informed and let you know once it is resolved or we have more information for you.