DOC to HTML5 and Back To DOC Format

Hello Telerik,

We own/use your product and I am looking for a little help with the following:

  1. Original DOC file uses MERGEFORMAT fields and CUSTOMDOCUMENT PROPERTIES
  2. Convert to HTML5 (EDIT)
  3. Convert back to DOC format retaining the MERGEFORMAT fields and CUSTOMDOCUMENT PROPERTIES.
  4. Keep original Margins

Here is my code:

To Covert to HTML5:
private static string ConvertToHtml5(string filepath)
{
string outputPath = “c:\Files\output.html”;

        Document doc = new Document(filepath);
        HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html)
        {
            HtmlVersion = HtmlVersion.Html5,
            ExportHeadersFootersMode = ExportHeadersFootersMode.FirstSectionHeaderLastSectionFooter,
            ExportImagesAsBase64 = true,
            ExportPageMargins = true,
            ExportDocumentProperties = true,
            ExportPageSetup = true,
            PrettyFormat = true
        };
        
        doc.Save(outputPath, opts);
        return outputPath;
    }

Here is my code to convert to DOC format
private static void ConvertToWord(string filePath, string sourcePath)
{
string savedDocumentPath = “c:\Files\WordOutput.doc”;

        HtmlLoadOptions loadOptions = new HtmlLoadOptions {LoadFormat = LoadFormat.Html};
        Document doc = new Document(filePath, loadOptions);
        DocumentBuilder builder = new DocumentBuilder(doc);

        builder.MoveToHeaderFooter(HeaderFooterType.FooterFirst);
        builder.PageSetup.DifferentFirstPageHeaderFooter = false;

        SaveOptions options = new DocSaveOptions(SaveFormat.Doc) {PrettyFormat = true};
        builder.Document.Save(savedDocumentPath, options);
    }

Note: I am using the builder for header/footer to ensure the same header/footer is on each page. Otherwise, my options are NONE, One page one only, or on all pages except page one. But I have this working, so no big deal.

Also, I can solve the Custom Document properties by copying them over from the original file to the converted file, but I still have the issue of the MERGEFORMAT fields.

Your help is greatly appreciated!
Thanks,
Chris

@Christopher44,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your input Word document
  • Aspose.Words generated output HTML5 file
  • Aspose.Words generated final output Word document showing the undesired behavior
  • Your expected document which shows the correct output. Please create this document by using Microsoft Word application.

As soon as you get these pieces of information ready, we’ll start investigation into your issue and provide you more information. Thanks for your cooperation.

Here is a simple example. Just open the solution and run it. The input document will be in the INPUT folder, and the converted HTML and .DOC files will be in the OUTPUT folder. As you can see, the converted .DOC file loses the field code and the custom document property. *I uploaded the *.zip file, but I don’t see it, so I’m not sure if you got it.

@Christopher44,

I am afraid, we do not see any uploaded files in this thread. Please ZIP those resources, upload ZIP file to Dropbox and share the Download link here for testing. Thanks for your cooperation.

See if this works. Dropbox - Aspose.DocumentConvert.Console.zip - Simplify your life

@Christopher44,

While using the latest version of Aspose.Words i.e. 18.2, we managed to reproduce this issue on our end. We have logged an issue in our bug tracking system to preserve custom DOCPROPERTY with MERGEFORMAT during Word-HTML5-Word round-trip. The ID of this issue is WORDSNET-16526. Your thread has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.