MultiByte Documents


I am trying to use Aspose.Word to convert HTML files with multi-byte languages in them (like Japanese or Chinese) into .doc files. I’m obviously getting the encoding wrong somewhere, all I get is a bunch of ??? chars. Is there sample code somewhere for doing this right?

We did not do a lot of testing with Chineese and Japanese documents yet, so if you attach some documents to the post, it will help us to sort out the things quickly.

OK. Here’s a file a user sent me. I believe it’s Japanese. I can get you Chinese, too.

Yes, post a Chinese doc too. So are you trying to convert these files from DOC to HTML or from HTML to DOC?

Sorry. I was being a little confusing. We’re actually trying to do both - initially convert from doc to html, and then later back to doc. Neither direction seems to work.

I’ll get you some chinese html files as well as chinese doc files. But even converting this to html doesn’t work, and it should.

Here’s a chinese file. I’ll get you chinese HTML (for the other direction) shortly.