Free Support Forum - aspose.com

Encode

Hi again,

I have a doubt about the encoding of aspose. I’m reading a word document with aspose, and then I’m creating a xml files with data. But my doubt is that when I read the data in word document, I don’t know if the getText() method return the data with determinate encode (UTF-8…), or I can specify the enconde.

Thanks in advance, Irene.

Hi Irene,


Thanks for your inquiry. Yes, by default the encoding is UTF-8. You can also specify the encoding to use when exporting in plain text format by using TxtSaveOptions.setEncoding method:
http://www.aspose.com/docs/display/wordsjava/TxtSaveOptions

Best Regards,

Hi,

Thanks for your answer, but I have two problems with the format, I write in XML file with UTF-8 encode, but the characters like --> …, not is reconigse.

For other hand I have images in format WMF, and this format I want save like png, but when I open the image don’t recognise the format.

Thanks in advance, Irene.

Hi
Irene,


Thanks for your inquiry. Could you please attach your input Word documents, you’re getting this problem with, here for testing? Also, please share the code to reproduce the same problems on my side. I will investigate the issue further and provide you more information.

Best Regards,

Hi!

Sorry for delay, I attached the doc with a example.

Inside of document, next to “Nota 14”, there is the char.

Thanks in advance!


Hi Irene,


Thanks for your inquiry. I have tested the scenario and have managed to reproduce the same problem. For the sake of correction, I have logged this problem as WORDSNET-7029 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Best Regards,
Hi Irene,

Thanks for your patience. After further investigation, the issue i.e. logged as WORDSNET-7029 has not been recognized as a bug; instead, it reflects the expected behaviour. To confirm, please rename 'charText.docx' to 'charText.zip' and extract the archive into a folder. You'll find the character code against that symbol is 'F02D' in the document.xml file. The following code returns the same character code:
Document doc = new Document("C:\\Temp\\charText.docx");

String testString = doc.getText();

for (char c : testString.toCharArray()) {
System.out.println(“Symbol=” + c + " Code=" + Integer.toHexString(c));
}

If we can help you with anything else, please feel free to ask.

Best Regards,

The issues you have found earlier (filed as WORDSNET-7029) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.