Detect language

When loading a Word document into Aspose, is there a way to tell what language it is? I am familiar with the LocaleID, LocaleIdFarEast, and LocaleIDBi properties of the font, but how do I know which property applies to my document if it could be in any language?

Hi Carl,

Thanks for your inquiry. Please note that there is no setting in Microsoft Word document that controls the language of the document as a whole. Instead, you can get the language of any text Run in Microsoft Word document using the Font.LocaleId property. So, if you want to discover the language setting of the document you should probably iterate through all Runs in the document and choose LocaleId that is occurring most often. I hope, this helps.

Best regards,

My question wasn’t how to get the language of the whole document, but how to tell which of the 3 LocaleID - FarEast, Bi, and the default - properties is being used for a given font.

There actually is, by the way, a way to get the language for the whole document body in Word - document.Content.LanguageID - though I’m not sure what that is set to if there are different languages in a document.

Hi Carl,

Thanks for your inquiry. The Font.Bidi property specifies whether the contents of this Run shall have right-to-left characteristics; therefore for RTL text (in case Font.Bidi returns true), you can check for the language identifier in Font.LocaleIdBi property. Secondly, Font.NameFarEast property is used to get or set an East Asian font name for a text and depending on it’s value you can check for the language identifier of the formatted Asian characters in Font.LocaleIdFarEast property. I hope, this helps.

Best regards,

Hi,

Thank you for the tip for the Bidi property, I was trying to get it of the Styles.Font which is always set to false instead of getting it off the Run.Font so that is working now. However, I’m still unclear as to how I’m supposed to detect if a language is a FarEast language just depending on the font name. If you could clarify that.

Thanks

Hi Carl,

Thanks for your inquiry. As I mentioned LocaleIdFarEast will work for Eastern languages, like Chinese, Korean and Japanese. There are three property for setting localeId: LocaleId, LocaleIdBi and LocaleIdFarEast. So, depending on language, you need to use one of them. If you use European languages, like English, German etc, you should use LocaleId. If you use Arabic languages you should use LocaleIdBi. And if you use Eastern languages you should use LocaleIdFarEast. If we can help you with anything else, please feel free to ask.

Best regards,

That doesn’t actually address my question - I understand the three properties. I am, however, trying to detect a language of a document, not setting it. So I don’t know which one to use. That’s the point. You mentioned before that given the value of the NameFarEast I could determine if the LocaleIdFarEast is actually used or just is set to some sort of default. That’s what I need to know is how to differentiate between LocaleIDFarEast values that are truly the language of the document and LocaleIDFarEast values that are not used or given a default value.

Hi Carl,

Thanks for the additional information and sorry for the confusion. I am in communication with our development team and I will update you as soon as I have this information.

Best regards,

Hi Carl,

Thanks for your patience.

Unfortunately, we do not provide explicit run properties to identify the FarEast languages and appropriate locale. For simplicity purposes we can state that specifications (e.g. ISO 29500) omit details about this and allow applications to use their own methods to identify languages and their locale.

PS: The way Aspose.Words identifies FarEast languages is by analyzing appropriate Unicode areas (although with some simplifications and assumptions). Unfortunately, this functionality is part of our rendering engine and is not available in the public model.

We apologize for any inconvenience.

Best regards,