I have a question concerning character encoding. I have a word document that has special characters written in it, including the “smiley face”. My ultimate goal is to have this document posted to my website and have chunks of it turned into html for the user to see. All is working except certain characters, like the smiley face character, don’t come through correctly . They show up as different characters.

To convert the word document to html I’ve coded the following:

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
SaveOptions saveOptions = document.getSaveOptions();
document.joinRunsWithSameFormatting();, SaveFormat.HTML);
String html = byteArrayOutputStream.toString("UTF-8");

In the head portion of the html page showing the data, I specify the character encoding:


Are there some UTF-8 characters that cannot be displayed in html on a browser, like the smiley face, but which can be displayed in a word document? Many other special characters come through just fine. Or am I doing something wrong?

Thanks for your inquiry. Could you please attach your input and output documents which cause the problem here for testing? I will investigate the issue and provide you more information.
