Hi,
I have a question concerning character encoding. I have a word document that has special characters written in it, including the “smiley face”. My ultimate goal is to have this document posted to my website and have chunks of it turned into html for the user to see. All is working except certain characters, like the smiley face character, don’t come through correctly . They show up as different characters.
To convert the word document to html I’ve coded the following:
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
SaveOptions saveOptions = document.getSaveOptions();
saveOptions.setHtmlExportEncoding(java.nio.charset.Charset.forName("UTF-8"));
document.setSaveOptions(saveOptions);
document.joinRunsWithSameFormatting();
document.save(byteArrayOutputStream, SaveFormat.HTML);
String html = byteArrayOutputStream.toString("UTF-8");
In the head portion of the html page showing the data, I specify the character encoding:
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=UTF-8">
Are there some UTF-8 characters that cannot be displayed in html on a browser, like the smiley face, but which can be displayed in a word document? Many other special characters come through just fine. Or am I doing something wrong?
Any insight very much appreciated!!
Jason