Escape or Decode Special Character Codes in HTML String & Set Word Document Variables | Java

Hi,

Our users can enter some special characters from html pages such as ‘ಠ’ and it will be converted to “ಠ” in Java. The problem is that when Aspose generates a Word document it displays “ಠ” instead of “ಠ”. Is there anything we can do in Aspose Java to handle it? Thanks.

@ligongzhu,

Please compress your input file and Aspose.Words generated Word document into ZIP format and attach the .zip file here for testing. We will then investigate the issue on our end and provide you more information.

Aspose.zip (27.5 KB)

Hi,

Please see my sample files attached for the issue. “Dis1Test.docx” is the source doc file containing some Docvariables.

You can run “TestAsposeSpecialChar.java” to generate “SpeiclaChar.docx” and see the issue.

You need to change the file locations hardcoded in the java file to match yours before you run it.

Thanks.

Our current Aspose java library is “aspose-words-15.5.0-jdk16.jar”.

@ligongzhu,

Instead of character codes, you can pass the actual Symbol like this:

Document newDoc = new Document("C:\\temp\\233988\\Dis1Test.docx");

newDoc.getVariables().add("HOCompleteAddress", "000⅛aaa");
newDoc.getVariables().add("OBOFullName", "111ಠbbb");
newDoc.getVariables().add("ClaimantFullName", "222ಠccc");
newDoc.getVariables().add("OBOCompleteAddress", "333 ddd");
newDoc.getVariables().add("ClaimantFullName", "444‑eee");

newDoc.updateFields();

newDoc.save("C:\\temp\\233988\\awjava-21.8.docx");

Output document produced by Aspose.Words for Java 21.8 is attached:

Hi,

I took a look at the document you generated. Only the first special char displays correctly?

I tried your Java code on my machine and the result is the same.

Also it is difficult for us to change to use special characters directly instead of using their character codes in Java. Is there anyway Aspose can display them correctly in the generated document ?

Thanks.

@ligongzhu,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-22654. We will further look into the details of this requirement and will keep you updated here on the status of the linked ticket.

@ligongzhu,

Regarding WORDSNET-22654, we have completed the work on your issue and concluded that we would not be able to implement any fix for this issue in Aspose.Words API. Your issue (WORDSNET-22654) has now been closed with ‘Won’t Fix’ resolution. The reason is that MS Word does not decode document variables and Aspose.Words should also not. You need to provide expected values. Here is a sample code:

Document document = new Document("C:\\Temp\\233988\\Dis1Test.docx");

addVariable(document, "HOCompleteAddress", "000⅛aaa");
addVariable(document, "OBOFullName", "111ಠbbb");
addVariable(document, "ClaimantFullName", "222ಠccc");
addVariable(document, "OBOCompleteAddress", "333 ddd");
addVariable(document, "ClaimantFullName", "444‑eee");

document.updateFields();

document.save("C:\\temp\\233988\\awjava-21.8.docx");

private static void addVariable(Document document, String key, String value) {
    document.getVariables().add(key, org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(value));
}