Single quote is changed after conversion from DOCX to HTML using Java

HPSoftwareALM · January 29, 2012, 3:27am

Private Topic:

I am trying to convert docx into html file using version 10.6.0.
The conversion of the following characters is corrupted:

’
“

For example:
I have Word document that contains the following line:
It is built from the characters ‘a-z’, ‘A-Z’, ‘0-9’, ‘.’ and ‘-‘

After the conversion i am getting html with the following line:

It is built from the characters ⵜa-z䮢, ⵜA-Z䮢, ⵜ0-9䮢, ⵜ.䮢 and ⵜ-ⵜ

This is the code:

Document doc = new Document(inputStr);
HtmlSaveOptions saveOptions = new HtmlSaveOptions(SaveFormat.HTML);
saveOptions.setEncoding(java.nio.charset.Charset.forName(“UTF-8”));
File out = new File(htmlFileName);
doc.save(out.getAbsolutePath(), saveOptions);

awais.hafeez · January 29, 2012, 4:37am

(Email to Topic) User’s email:

Hi Daniel,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 10.8.0, I was unable to reproduce this issue on my side. I have attached the DOCX/HTML files i.e. generated on my side here for your reference. Moreover, I would suggest you please visit the following link for downloading and using the latest version of Aspose.Words:

I hope, this will help.

Best Regards,