Saving as text format includes a UTF-8 BOM


Hello,

If you save with com.aspose.words.SaveFormat.TEXT, then the output file is in UTF-8, but includes a Unicode BOM.

It shouldn't, as UTF-8 only has one byte order.

I've had to write code to remove this, but it would be great if this could be fixed, or made optional, in a future version.

Thanks,

Ben



Hi Ben,

Thanks for your inquiry. Could you please attach your 1) input Word document, 2) output text file and 3) source code you're using to generate this text file here for testing? We will investigate the issue on our end and provide you more information.

Best regards,

Please find document attached. My code is:

Document doc = new Document(this.inputPathname);

TxtSaveOptions options = new TxtSaveOptions();
options.setSaveFormat(com.aspose.words.SaveFormat.TEXT);
options.setEncoding(java.nio.charset.Charset.forName("UTF-8"));
options.setExportHeadersFooters(false);
options.setParagraphBreak("\n\n");
options.setPreserveTableLayout(false);
options.setPrettyFormat(true);
doc.save(output, options);

Thanks for looking into this.

Ben

Hi Ben,


Thanks for your inquiry. After an initial test with Aspose.Words for Java 14.11.0, I was unable to reproduce this issue on my side (please see attached out-awjava-14.11.0.txt). I would suggest you please upgrade to the latest version of Aspose.Words. You can download it from the following link. I hope, this helps.

Best regards,

Your exported file demonstrates the problem!

Here's a hex dump of the first 16 bytes of out-awjava-14.11.0.txt

0000: EF BB BF 58 30 59 20 58 31 59 20 58 32 59 0A 0A ...X0Y X1Y X2Y..

The file starts 0xEF,0xBB,0xBF, which is a UTF-8 encoded Unicode BOM.

http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

UTF-8 files shouldn't include BOMs, as they make no sense, and just confuse consuming software.

Ben

Hi Ben,


Thanks for the additional information. I have logged this problem in our issue tracking system as WORDSNET-11155. We will further look into the details of this problem and keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

I think this answers my problem here Problems with Word Docx ContentType - Free Support Forum - aspose.com

Hi Andy,


Thanks for your inquiry. It is great you were able to find what you were looking for. Please let us know any time you have any further queries.

Best regards,