Inconsistent conversion of Word to HTML format

The converted effect has overlapping text, but the original text does not overlap. May I ask what is the reason for this and can this problem be fixed。
This is the effect of the source file Word:

This is the effect of converted HTML:

@qianshang Could you please attach your source and output documents here for testing? We will check the issue and provide you more information.

Also, if you need to preserve original document layout in HTML, you can consider using HtmlFixed format:

Document doc = new Document("C:\\Temp\\in.docx");
doc.save("C:\\Temp\\out.html", SaveFormat.HTML_FIXED);

But please note, HtmlFixed is designed for preserving MS Word document layout only for viewing purposes. This format is not designed for document roundtrip. Its structure may not be loaded correctly back to Aspose.Words DOM.

Thank you for your reply。
This is the Word source file:转换重叠文档.docx (34.5 KB)
This is the output file I obtained by executing the following code:转换重叠文档.zip (4.8 KB)

If I use the method you mentioned to execute the output HTML file, the effect will be even worse,
This is the file output using the method you mentioned:1.zip (12.6 KB)

@qianshang As I can see MS Word HTML output looks the same as Aspose.Words output:

The following code produced good output on my side:

Document doc = new Document("C:\\Temp\\in.doc");
        
HtmlFixedSaveOptions opt = new HtmlFixedSaveOptions();
opt.setExportEmbeddedSvg(true);
opt.setExportEmbeddedImages(true);
opt.setExportEmbeddedFonts(true);
opt.setExportEmbeddedCss(true);
        
doc.save("C:\\Temp\\out.html", opt);

out_fixed.zip (79.2 KB)

Please note to produce accurate Html Fixed output, the fonts used in the document should be available in the environment where the document is converted.

Thank you for your prompt reply,I tried to convert using the method you informed me, and this is the effect after my conversion. I have fonts locally, and I don’t know where the problem lies,my aspose.word version is 23.4.
1.zip (3.8 KB)

Document doc = new Document("C:\\Temp\\in.doc");
        
HtmlFixedSaveOptions opt = new HtmlFixedSaveOptions();
opt.setExportEmbeddedSvg(true);
opt.setExportEmbeddedImages(true);
opt.setExportEmbeddedFonts(true);
opt.setExportEmbeddedCss(true);
        
doc.save("C:\\Temp\\out.html");

@qianshang In your code, you did not pass HtmlFixedSaveOptions into the Document.save method. The code should be the following:

doc.save("C:\\Temp\\out.html", opt);

I used the HtmlFixedSaveOptions for conversion and the results were very good. But all the elements inside have become div, and if it is a table, they are also div, without table col rows. So I can only use HtmlSaveOptions in my project. So if i still use HtmlSaveOptions for conversion, are there any parameters that can be adjusted?

@qianshang You can try resetting line spacing rule to work the problem around:

Document doc = new Document("C:\\Temp\\in.doc");
        
Iterable<Paragraph> paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for(Paragraph p : paragraphs)
    p.getParagraphFormat().setLineSpacingRule(LineSpacingRule.AT_LEAST);
        
doc.save("C:\\Temp\\out.html");

Thank you, your proposal has been effective. In addition, I have other conversion issues here. Attached are my source file docx and converted HTML files. There is an issue with the conversion of the final table in the document, which is also using HtmlSaveOptions. May I know how to adjust it.
产品购销合同 (1).zip (21.3 KB)

@qianshang There is no table at the nd of the document. There is two columns section and two floating shapes that imitates borders. I am afraid it is not possible to preserve accurate layout after conversion such document to HTML.

Okay, thank you

1 Like