Foreign Documents/Character Sets

Hi,

is there any support for non-english or documents written in character sets other than english, such as:

Albanian
Arabic
Dari
Kurdish
Macedonian
Pashto
Russian
Serbo-Croat

I would like to be able to convert a document to plain text, then use the ChilkatSoft.com converter to convert it to Unicode so I can display it on a webpage?

thanks,

Wizbit.

Hi Wizbit,

We have tested Aspose.Word can read, process and write Word documents with any Unicode characters. As long as MS Word supports that language Aspose.Word will process the document okay.

It is a different question if you want to write text files. At the moment Aspose.Word writes text files using default UTF8Encoding of StreamWriter class. The .Net documentation for StreamWriter says “UTF-8 handles all Unicode characters correctly and gives consistent results on localized versions of the operating system.”.

So now its really up to you to decide whether text in UTF8 encoding is good for your purposes.

Currently there is no way to instruct Aspose.Word to use a different encoding (for example Unicode) when producing text files, but please come back and we will add this feature if you need it.

hi Roman,

I am still having problems opening documents written in non-english character sets (eg: russian) and then using:

mergeDoc.save(strm, Aspose.Word.SaveFormat.FormatText)


to save the file as Text, and then:

Dim sb As String = System.Text.Encoding.UTF8.GetString(strm.ToArray)

to convert the text to UTF-8.

Does the original document already have to be in UTF-8 format, as I all I get back is unreadable characters?

thanks,

Wizbit.

I tested your code and it returns all the correct characters. Please post all your code or email it along with the document to word@aspose.com.