Merging mutiple DOCX and HTML into one DOCX document

Hi,

I am experiencing difficulty merging html and docx documents into one docx document.

I’ve attached the html and docx files, along with the source code. See also the resultant docx file (“Output-HtmlDocx.docx”), which is the resulting docx from create a docx from a html file - using DocumentBuilder - appending an original docx and then attempting to append another html. It can be seen how the first html looks fine while the html that was appended after the docx is incorrect and should match the first.

It appears to only be a problem when a docx is added, as when merging 20 html documents the document comes out OK. I suspect it is something to do with the headers / footers but cannot get to the bottom of it.

Any help appreciated.

Thanks

Hi there,

Thanks for your inquiry. Please note that Aspose.Words mimics the same behavior as MS Word does. If you open your Docx and html in MS Word and insert the contents of Html at the end of Docx, you will get the same output.

In your case, we suggest you please use following workaround. Hope this helps you. Please use this workaround when you are appending html to Docx.

docs.Clear();
docs.Add(new Document(MyDir + @"html.html", new LoadOptions { LoadFormat = LoadFormat.Html }));
docs.Add(new Document(MyDir + @"20.docx"));
docs.Add(new Document(MyDir + @"html.html", new LoadOptions { LoadFormat = LoadFormat.Html }));
Output = MyDir + @"Output-HtmlDocx.docx";
MergeDocuments();
private static void MergeDocuments()
{
    if (!docs.Any())
    {
        return;
    }
    Document primaryDoc = null;
    foreach (var doc in docs)
    {
        if (docs.IndexOf(doc) == 0)
        {
            // Use first document in the list as the primary document.
            primaryDoc = doc;
            continue;
        }
        // Append all successive documents to the primary document.
        if (primaryDoc == null)
        {
            continue;
        }
        if (!doc.OriginalLoadFormat.ToString().Equals("docx", StringComparison.CurrentCultureIgnoreCase))
        {
            foreach (Section srcSection in doc)
            {
                var newSection =
                (Section)primaryDoc.ImportNode(srcSection, true, ImportFormatMode.KeepSourceFormatting);
                newSection.HeadersFooters.LinkToPrevious(false);
                // Remove header footer from document before appending if first doc wasn't docx (i.e. no headers / footers)
                if (!doc.OriginalLoadFormat.ToString().Equals("docx", StringComparison.CurrentCultureIgnoreCase))
                {
                    foreach (HeaderFooter hf in newSection.HeadersFooters)
                    {
                        hf.Remove();
                    }
                }
                primaryDoc.Sections.Add(newSection);
            }
        }
        else
        {
            doc.FirstSection.HeadersFooters.LinkToPrevious(false);
            primaryDoc.AppendDocument(doc, ImportFormatMode.KeepSourceFormatting);
        }
    }
    primaryDoc.LastSection.PageSetup.FooterDistance = 0;
    primaryDoc.LastSection.PageSetup.HeaderDistance = 0;
    HeaderFooter header = new HeaderFooter(primaryDoc, HeaderFooterType.HeaderPrimary);
    primaryDoc.LastSection.HeadersFooters.Add(header);
    header.AppendChild(new Paragraph(primaryDoc));
    HeaderFooter footer = new HeaderFooter(primaryDoc, HeaderFooterType.FooterPrimary);
    primaryDoc.LastSection.HeadersFooters.Add(footer);
    footer.AppendChild(new Paragraph(primaryDoc));
    if (primaryDoc != null)
    {
        primaryDoc.Save(Output, SaveFormat.Docx);
    }
}