Converting Word to HTML and then to Word will result in a different style from the source Word

Hello, I am using Java aspose version 21.4 to convert Word to HTML, but the Word style is missing. Converting HTML to Word again will result in a different style from the source Word. Is there any way to handle this? Please use the code to illustrate it

@Mikeykiss Please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML.

If the output HTML is for viewing purposes, i.e. it is not supposed to be edited or processed, you can consider using HtmlFixed format. In this case the output should look exactly the same as it looks in MS Word:

Document doc = new Document("C:\\temp\\in.docx");
HtmlFixedSaveOptions opt = new HtmlFixedSaveOptions();
opt.setExportEmbeddedCss(true);
opt.setExportEmbeddedFonts(true);
opt.setExportEmbeddedImages(true);
opt.setExportEmbeddedSvg(true);
doc.save("C:\\Temp\\out.html", opt);

HtmlFixed format is designed to preserve original document layout for viewing purposes. So if your goal is to display the HTML on page, then this format can be considered as an alternative. But unfortunately, it does not support roundtrip to DOCX at all.

@alexey.noskov Thank you. This code can indeed maintain the format of the source document when converted, but is there any other solution to convert HTML to Word? For Word to HTML, maintaining a 90% style is sufficient

@Mikeykiss As I have mentioned, HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. The fidelity depends on complexity of your document. If possible could you please attach your problematic input document here for our reference? We will check it and provide you more information.

@alexey.noskov Hello, may I ask if the style and font of the HTML to Word directory remain unchanged? These differences are not significant, so please use the code as an example. Thank you

@Mikeykiss Unfortunately, your question is not clear enough. Could you please elaborate it in more details?

@alexey.noskov Hello, may I ask how to adjust the width of each column in all HTML tables to be the same? Please provide a code example. Thank you

@Mikeykiss Could you please attach your input and expected output documents? We will check them and provide you more information.

@alexey.noskov Hello, the first column in the Word table is not very wide, but when exporting HTML, it is already very wide. The code is as follows

// 加载HTML文件
Document doc = new Document("D:\\测试\\测试\\测试文档.doc");
HtmlSaveOptions opt = new HtmlSaveOptions();
opt.setPrettyFormat(true);
opt.setExportFontResources(true);
opt.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
opt.setExportImagesAsBase64(true);
opt.setExportDocumentProperties(true);
opt.setExportFontsAsBase64(true);
opt.setExportCidUrlsForMhtmlResources(true);
opt.setExportXhtmlTransitional(true);
//opt.setExportPageMargins(true);
opt.setExportPageSetup(true);
opt.setExportTocPageNumbers(true);
opt.setExportRelativeFontSize(true);
opt.setExportOriginalUrlForLinkedImages(true);
//setDocWidth(doc);
doc.save("D:\\Temp\\out.html", opt);

sourceFile.zip (63.9 KB)

Hello, the output HTML can be used to write modifications. Which solution is most suitable

@Mikeykiss Please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML. If you convert your document to HTML using MS Word you will see exactly the same result.