Hello,
If you save with com.aspose.words.SaveFormat.TEXT, then the output file is in UTF-8, but includes a Unicode BOM.
It shouldn't, as UTF-8 only has one byte order.
I've had to write code to remove this, but it would be great if this could be fixed, or made optional, in a future version.
Thanks,
Ben
Hi Ben,
Thanks for your inquiry. Could you please attach your 1) input Word document, 2) output text file and 3) source code you're using to generate this text file here for testing? We will investigate the issue on our end and provide you more information.
Best regards,
Please find document attached. My code is:
Document doc = new Document(this.inputPathname);
TxtSaveOptions options = new TxtSaveOptions();
options.setSaveFormat(com.aspose.words.SaveFormat.TEXT);
options.setEncoding(java.nio.charset.Charset.forName("UTF-8"));
options.setExportHeadersFooters(false);
options.setParagraphBreak("\n\n");
options.setPreserveTableLayout(false);
options.setPrettyFormat(true);
doc.save(output, options);
Thanks for looking into this.
Ben
Your exported file demonstrates the problem!
Here's a hex dump of the first 16 bytes of out-awjava-14.11.0.txt
0000: EF BB BF 58 30 59 20 58 31 59 20 58 32 59 0A 0A ...X0Y X1Y X2Y..
The file starts 0xEF,0xBB,0xBF, which is a UTF-8 encoded Unicode BOM.
http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
UTF-8 files shouldn't include BOMs, as they make no sense, and just confuse consuming software.
Ben
Hi Ben,
Thanks for the additional information. I have logged this problem in our issue tracking system as WORDSNET-11155. We will further look into the details of this problem and keep you updated on the status of correction. We apologize for your inconvenience.
Best regards,
Hi Andy,
Thanks for your inquiry. It is great you were able to find what you were looking for. Please let us know any time you have any further queries.
Best regards,