What is the default encoding for text returned from Range.getText()

KeithTan · September 2, 2010, 1:11am

Hi,

As per the subject, what is the charset encoding for the text string that is returned from the Range.getText() while parsing a Word document.

Is it to the system default java charset or is it in UTF-8.
Is there any other methods that can be used to grab the text from a word document in a specific encoding.
(Something similar to the extractText(java.lang.String encoding) method from the pdfextractor class)

alexey.noskov · September 2, 2010, 4:50am

Hi

Thanks for your inquiry. Please follow the link to learn how you can extract text from Word document:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/
You can set encoding using code like the following:

Document doc = new Document("C:\\Temp\\test.doc");
doc.getSaveOptions().setTxtExportEncoding(Charset.defaultCharset());
doc.save("C:\\Temp\\out.txt");

Hope this helps.
Best regards,