Regarding WORDSNET-21853, we have completed the analysis of this issue and concluded to close this issue with “not a bug” status. Please check the following analysis details:
The “junk characters” are in fact stored in the source Word document in invisible Paragraphs. Those Paragraphs are not rendered in MS Word, because their line spacing is set to zero. In HTML, however, line spacing (“line-height”) cannot be zero, and the paragraphs become visible. The same effect is observed in HTML documents generated by MS Word. We are going to close this issue as “Not a Bug” not only because Aspose.Words copies MS Word’s behavior in this case, but also because zero line spacing is an uncommon corner case. For example, MS Word’s user interface doesn’t allow to set line spacing to zero.
As a workaround, you can remove paragraphs with zero line spacing before saving the Word DOCX document to HTML:
Document doc = new Document("C:\\Temp\\Error File\\Error File.docx");
for (Paragraph paragraph : doc.getFirstSection().getBody().getParagraphs())
if (paragraph.getParagraphFormat().getLineSpacing() == 0)
doc.save("C:\\Temp\\Error File\\awjava-21.2 workaround.html");