We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

What is the default encoding for text returned from Range.getText()

Hi,

As per the subject, what is the charset encoding for the text string that is returned from the Range.getText() while parsing a Word document.

Is it to the system default java charset or is it in UTF-8.
Is there any other methods that can be used to grab the text from a word document in a specific encoding.
(Something similar to the extractText(java.lang.String encoding) method from the pdfextractor class)


Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. Please follow the link to learn how you can extract text from Word document:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/howto-extract-text-only.html

You can set encoding using code like the following:

Document doc = new Document("C:\\Temp\\test.doc");

doc.getSaveOptions().setTxtExportEncoding(Charset.defaultCharset());

doc.save("C:\\Temp\\out.txt");

Hope this helps.

Best regards,