DOCX>HTML>DOCX conversion issue with indentation and spacing using C#

Hello,

I have a docx form that I need to convert to fixed html (to use in RTF editor) and then back to docx. But when I do that, the formatting gets all funky.

Code used (simplified):

        var wordDoc = new Document(originalDocxFile);
        var htmlStream = new MemoryStream();

        var htmlFixedSaveOptions = new HtmlFixedSaveOptions
        {
            ExportEmbeddedCss = true,
            ExportEmbeddedFonts = true,
            ExportEmbeddedImages = true,
            ExportEmbeddedSvg = true
        };
        wordDoc.Save(htmlStream, htmlFixedSaveOptions);

        var htmlDoc = new Document(htmlStream);
        htmlDoc.Save(docxFile, SaveFormat.Docx);

Files (contains the original docx file, fixed html file and the docx generated by aspose): wordFormTransform.zip (117.5 KB)

Note: I get the same result when I use intermediate file on disk instead of MemoryStream as well.

Kind regards,

Maroš

@MarosXceptor

We suggest you please read the following article.
Supported Document Formats

Please note that Aspose.Words does load HtmlFixed file format into its DOM. You can save your document to HTML file format instead of HtmlFixed. Hope this helps you.

@tahir.manzoor Thank you for your answer!

When I save my document as HTML however, when I convert it back to word document, not all formatting is preserved (indentation and spacing is off and some characters move around). Is there a way to convert docx to html and back without any loss of formatting (or at least with minimal loss)?

Kind regards,

Maroš

@MarosXceptor

Please note that HTML and Word file formats are quite different. So, sometimes it is hard to achieve 100% fidelity.

We have converted the document to HTML and DOCX using latest version of Aspose.Words for .NET 20.1 with following code example. We have not found the issue in output document.

Please check the attached output documents and share the screenshots of problematic sections in them. We will then provide you more information on it.
Docs.zip (11.6 KB)

Document doc = new Document(MyDir + "originalAppendix.docx");
doc.Save(MyDir + "20.1.html");

Document doc2 = new Document(MyDir + "20.1.html");
doc2.Save(MyDir + "out//20.1.docx");

@tahir.manzoor

Yeah, I am aware that many of the word features cannot be transferred to HTML, though the main issues I am experiencing are indentation and spacing, which don’t seem impossible to fix.

formattingIssues.zip (218.0 KB)

Please see above for a highlight of the formatting issues on the documents you sent to me.

Kind regards,

Maroš

@MarosXceptor

We have logged this problem in our issue tracking system as WORDSNET-19909. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

1 Like

@tahir.manzoor Thank you!

The issues you have found earlier (filed as WORDSNET-19909) have been fixed in this Aspose.Words for .NET 20.3 update and this Aspose.Words for Java 20.3 update.