Merging word documents loses headers

I am trying to merge multiple Word documents together and a table of contents at the beginning. It all works, except that one customer makes extensive use of headers, and the headers are being lost after the documents are merged together.

I am using Aspose.Words.NET v23.1.0

Here is an example document:

811_970.docx (36.3 KB)

Here is my code:

var mergedDoc = new Document();
var mergedDocBuilder = new DocumentBuilder(mergedDoc);

mergedDocBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading1;
mergedDocBuilder.Writeln(title);
mergedDocBuilder.InsertBreak(BreakType.PageBreak);

// https://support.microsoft.com/en-us/office/field-codes-toc-table-of-contents-field-1f538bc4-60e6-4854-9f64-67754d78d05c
mergedDocBuilder.InsertTableOfContents(@"\h \z \f p");
mergedDocBuilder.InsertBreak(BreakType.PageBreak);

var policiesWithDocuments = policies
    .Where(p => p.Document != null)
    .ToList();

var importFormatOptions = new ImportFormatOptions()
{
    IgnoreHeaderFooter = false
};

foreach (var policy in policiesWithDocuments)
{
    using var vs = await _documentService.DownloadPolicyDocumentAsStreamAsync(policy.Document!.Id, false);

    var policyDoc = new Document(vs.Stream);

    var builder = new DocumentBuilder(policyDoc);
    builder.InsertField($@"TC ""{policy.Name}"" \f p");

    mergedDocBuilder.InsertDocument(policyDoc, ImportFormatMode.KeepDifferentStyles, importFormatOptions);

    if (policy != policiesWithDocuments.Last())
    {
        mergedDocBuilder.InsertBreak(BreakType.PageBreak);
    }
}

mergedDoc.UpdateFields();
mergedDoc.UpdatePageLayout();

var ms = new MemoryStream();
mergedDoc.Save(ms, SaveFormat.Pdf);

@jmunro,

There seems to be an issue with the InsertDocument method, as it is not copying the titles. I will continue to investigate this a bit more before raising it to the developer’s team.

In the meantime, I made some workaround code to allow you to do what you wanted.

private void Logic()
{
    string title = "Some Title to display in the first page.";
    var target = new Document();
    var docBuilder = new DocumentBuilder(target);

    docBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading1;
    docBuilder.Writeln(title);
    docBuilder.InsertBreak(BreakType.PageBreak);

    docBuilder.InsertTableOfContents(@"\h \z \f p");            

    var documentList = new List<Tuple<string,string>>()
    {
       new Tuple<string,string>("Document With Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader1_input.docx"),
       new Tuple<string,string>("Document without Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader2_input.docx"),
       new Tuple<string,string>("Another Document With Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader3_input.docx"),
    };

    foreach (var tuple in documentList)
    {
        var source = new Document(tuple.Item2);
        source.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
        source.FirstSection.HeadersFooters.LinkToPrevious(false);

        var builder = new DocumentBuilder(source);
        builder.InsertField($@"TC ""{tuple.Item1}"" \f p");                
        
        foreach (Section srcSection in source.Sections)
        {
            Node dstSection = target.ImportNode(srcSection, true, ImportFormatMode.KeepSourceFormatting);

            target.AppendChild(dstSection);
        }
    }

    target.UpdateFields();
    target.UpdatePageLayout();           

    target.Save($@"{prefixPath}\Merges\MergedDocumentLooseHeaderOne_output.docx");
    target.Save($@"{prefixPath}\Merges\MergedDocumentLooseHeaderOne_output.pdf", SaveFormat.Pdf);
}

The 3 random input files I used;
MergedDocumentLooseHeader1_input.docx (36.3 KB)
MergedDocumentLooseHeader2_input.docx (12.4 KB)
MergedDocumentLooseHeader3_input.docx (24.0 KB)

1 Like

That’s great, thank you!

@jmunro If you always insert the documents at the end of the merged document. I would suggest you to use Document.AppenDocument method instead of DocumentBuuder.InsertDocument. Page setup as well as headers/footers in MS Word document are defined per section. If use Document.AppenDocument whole sections from the source documents are copied into the destination documents. For example see the following simplified code:

Document mergedDoc = new Document();
DocumentBuilder mergedDocBuilder = new DocumentBuilder(mergedDoc);

mergedDocBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading1;
mergedDocBuilder.Writeln("Some title");
mergedDocBuilder.InsertBreak(BreakType.PageBreak);

// https://support.microsoft.com/en-us/office/field-codes-toc-table-of-contents-field-1f538bc4-60e6-4854-9f64-67754d78d05c
mergedDocBuilder.InsertTableOfContents(@"\h \z \f p");

// Append document insted of inserting it.
Document policyDoc = new Document(@"C:\Temp\in.docx");
DocumentBuilder policyDocBuilder = new DocumentBuilder(policyDoc);
// Configure that the appended document always start on a new page.
policyDoc.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;

policyDocBuilder.InsertField($@"TC ""Test document"" \f p");

mergedDoc.AppendDocument(policyDoc, ImportFormatMode.KeepDifferentStyles);


mergedDoc.UpdateFields();
mergedDoc.UpdatePageLayout();

mergedDoc.Save(@"C:\Temp\out.docx");
mergedDoc.Save(@"C:\Temp\out.pdf");

When using AppendDocument I have a different header problem - now if I merge a document that has headers with a document that doesn’t, the headers are applied to the second document.

@jmunro If section does not have its own headers/footers they are inherited from the previous section. You can use HeaderFooterCollection.LinkToPrevious method to disable this:

Document policyDoc = new Document(@"C:\Temp\in.docx");
DocumentBuilder policyDocBuilder = new DocumentBuilder(policyDoc);
// Configure that the appended document always start on a new page.
policyDoc.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
// Unlink headers/footers from previous section.
policyDoc.FirstSection.HeadersFooters.LinkToPrevious(false);
1 Like

Thank you!

Not sure if I should open a separate topic for this, but I have an issue with the page numbers in the merged document - the table of contents shows all page numbers being 1.

I’ve tried ImportFormatOptions.KeepSourceNumbering both set to true and false but it doesn’t seem to make a difference.

@jmunro ImportFormatOptions.KeepSourceNumbering option is for list numbering not for page numbering.
Could you please attach your sample output document here for our reference? We will check the issue and provide you more information.

download (2).docx (21.6 KB)

@jmunro This occurs because PageSetup.RestartPageNumbering flag is set in the section in the source document. You can reset it before appending the documents so the numbering continues.

I tried setting

mergedDoc.LastSection.PageSetup.RestartPageNumbering = false;

after inserting the ToC and before appending the documents but it didn’t work, did I misunderstand what to do?

@jmunro. You were close, but is the other way around:

mergedDoc.FirstSection.PageSetup.RestartPageNumbering = false;

This way when the document is inserted, the document itselft since the first section wont restart the page number.

Hmm, I still get all of them showing as page 1

@jmunro,

This is the code I am running and the output:

private void Logic()
{
    string title = "Some Title to display in the first page.";
    var target = new Document();
    target.Styles[StyleIdentifier.Toc1].Font.Size = 20;
    var docBuilder = new DocumentBuilder(target);

    docBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading1;
    docBuilder.Writeln(title);
    docBuilder.InsertBreak(BreakType.PageBreak);

    docBuilder.InsertTableOfContents(@"\h \z \f p");            

    var documentList = new List<Tuple<string,string>>()
    {
       new Tuple<string,string>("Document With Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader1_input.docx"),
       new Tuple<string,string>("Document without Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader2_input.docx"),
       new Tuple<string,string>("Another Document With Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader3_input.docx"),
       new Tuple<string,string>("Document with format in the Header", $@"{prefixPath}\Merges\MergedDocumentLooseHeader4_input.docx"),
    };

    foreach (var tuple in documentList)
    {
        var source = new Document(tuple.Item2);                
        source.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
        source.FirstSection.HeadersFooters.LinkToPrevious(false);
        source.FirstSection.PageSetup.RestartPageNumbering = false;

        var builder = new DocumentBuilder(source);
        builder.Font.ClearFormatting();
        builder.InsertField($@"TC ""{tuple.Item1}"" \f p");

        target.AppendDocument(source, ImportFormatMode.KeepDifferentStyles);
    }

    target.UpdateFields();
    target.UpdatePageLayout();           

    target.Save($@"{prefixPath}\Merges\MergedDocumentWithAppend_output.docx");
    target.Save($@"{prefixPath}\Merges\MergedDocumentWithAppend_output.pdf", SaveFormat.Pdf);            
}

Use it as a reference only as @alexey.noskov suggested a better way to handle the character style in the other post.

The output: MergedDocumentWithAppend_output.docx (28.8 KB)

Thanks, I was setting it on the wrong builder

Now the problem is that the page number fields on the merged documents are changing to be correct for the merged document, but the customer wants them to show the original page numbers

I’ve tried only calling Update on the ToC instead of UpdateFields on the whole document, but that didn’t fix it

@jmunro I am afraid this is expected behavior of TOC field. It shows the value of PAGE on the page it refers to. If you restart numbering in section, value of PAGE field is also restarted. You can easily see this using the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertTableOfContents(@"\h \z \f p");

builder.InsertBreak(BreakType.PageBreak);
builder.InsertField(@"TC ""Test"" \f p");
builder.InsertField(@"PAGE");
builder.InsertBreak(BreakType.SectionBreakNewPage);
builder.PageSetup.RestartPageNumbering = true;
builder.InsertField(@"TC ""Test1"" \f p");
builder.InsertField(@"PAGE");
builder.InsertBreak(BreakType.SectionBreakNewPage);
builder.PageSetup.RestartPageNumbering = true;
builder.InsertField(@"TC ""Test2"" \f p");
builder.InsertField(@"PAGE");
builder.InsertBreak(BreakType.SectionBreakNewPage);
builder.PageSetup.RestartPageNumbering = true;
builder.InsertField(@"TC ""Test3"" \f p");
builder.InsertField(@"PAGE");

doc.UpdateFields();
doc.UpdatePageLayout();

doc.Save(@"C:\Temp\out.docx", SaveFormat.Docx);
doc.Save(@"C:\Temp\out.pdf", SaveFormat.Pdf);

In your case you can either omit page numbers in the TOC at all or build TOC manually. In first case you can simply add \n switch in the TC fields. The second case is more complex. You can use bookmarks as a reference points and use LayoutCollector to determine the absolute page index of the bookmarks. Here is simplified code that shows the main idea of the approach:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Insert placeholder where the manual TOC will be built.
BookmarkStart manualTocPlaceholderStart = builder.StartBookmark("manualTocPlaceholder");
builder.EndBookmark(manualTocPlaceholderStart.Name);

// Put some dummy sections where page numbers are restarted.
List<string> bookmakrs = new List<string>();
for (int i = 0; i < 10; i++)
{
    builder.InsertBreak(BreakType.SectionBreakNewPage);
    builder.PageSetup.RestartPageNumbering = true;

    // Insert bookmark.
    string bkName = string.Format("document_{0}", i);
    builder.StartBookmark(bkName);
    builder.EndBookmark(bkName);
    builder.Write(bkName + " PAGE field value is ");
    builder.InsertField("PAGE");

    bookmakrs.Add(bkName);
}

// Now build the manual TOC using the inserted bookmakrs.
LayoutCollector collector = new LayoutCollector(doc);
builder.MoveToBookmark(manualTocPlaceholderStart.Name);
// Configure tab stop to show page numbers at the right like in real TOC.
builder.ParagraphFormat.TabStops.Clear();
double tabPosition = builder.PageSetup.PageWidth - builder.PageSetup.LeftMargin - builder.PageSetup.RightMargin;
builder.ParagraphFormat.TabStops.Add(tabPosition, TabAlignment.Right, TabLeader.Dots);

foreach (string bkName in bookmakrs)
{
    // Determine absolute page number where bookmakr starts.
    int pageIndex = collector.GetStartPageIndex(doc.Range.Bookmarks[bkName].BookmarkStart);
    // insert hyperlink to the bookmark.
    builder.InsertHyperlink(string.Format("{0}\t{1}", bkName, pageIndex), bkName, true);
    builder.Writeln();
}

doc.Save(@"C:\Temp\out.docx", SaveFormat.Docx);
doc.Save(@"C:\Temp\out.pdf", SaveFormat.Pdf);
1 Like

Could I replace the page number fields on the documents with the appropriate text before I append them? Would that be easier?

I can’t get the manual ToC to work - if I save it as a DOCX then the ToC hyperlinks point to blank pages before the appended documents and if I save it as PDF then the ToC is missing completely.

download.docx (21.3 KB)
download.pdf (139.3 KB)

Here’s my code:

var mergedDoc = new Document();
var mergedDocBuilder = new DocumentBuilder(mergedDoc);

mergedDocBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Title;
mergedDocBuilder.Writeln(title);
mergedDocBuilder.InsertBreak(BreakType.PageBreak);

var tocBookmark = mergedDocBuilder.StartBookmark("ComplyVision_ToC");
mergedDocBuilder.EndBookmark(tocBookmark.Name);

var policiesByBookmark = new Dictionary<string, string>();

var policiesWithDocuments = policies
	.Where(p => p.Document != null)
	.ToList();

var importFormatOptions = new ImportFormatOptions() {
	IgnoreHeaderFooter = true,
	SmartStyleBehavior = true
};

foreach (var policy in policiesWithDocuments) {
	mergedDocBuilder.InsertBreak(BreakType.SectionBreakNewPage);
	mergedDocBuilder.PageSetup.RestartPageNumbering = true;

	var bookmarkName = $"ComplyVision__Policy_{policy.Id}";
	mergedDocBuilder.StartBookmark(bookmarkName);
	mergedDocBuilder.EndBookmark(bookmarkName);
	policiesByBookmark.Add(bookmarkName, policy.Name);

	using var vs = await _documentService.DownloadPolicyDocumentAsStreamAsync(policy.Document!.Id, false);

	var policyDoc = new Document(vs.Stream);
	policyDoc.FirstSection.HeadersFooters.LinkToPrevious(false);

	mergedDoc.AppendDocument(policyDoc, ImportFormatMode.KeepDifferentStyles, importFormatOptions);
}

// build the manual TOC using the inserted bookmakrs.
var collector = new LayoutCollector(mergedDoc);
mergedDocBuilder.MoveToBookmark(tocBookmark.Name);
// Configure tab stop to show page numbers at the right like in real TOC.
mergedDocBuilder.ParagraphFormat.TabStops.Clear();
var tabPosition = mergedDocBuilder.PageSetup.PageWidth - mergedDocBuilder.PageSetup.LeftMargin - mergedDocBuilder.PageSetup.RightMargin;
mergedDocBuilder.ParagraphFormat.TabStops.Add(tabPosition, TabAlignment.Right, TabLeader.Dots);

foreach (var (bookmarkName, policyName) in policiesByBookmark) {
	// Determine absolute page number where bookmakr starts.
	var pageIndex = collector.GetStartPageIndex(mergedDoc.Range.Bookmarks[bookmarkName].BookmarkStart);

	// insert hyperlink to the bookmark.
	mergedDocBuilder.InsertHyperlink($"{policyName}\t{pageIndex}", bookmarkName, true);
	mergedDocBuilder.Writeln();
}

// save as PDF
var ms = new MemoryStream();
mergedDoc.Save(ms, SaveFormat.Pdf);

@jmunro In your cade you insert a bookmark into the empty section, which is displayed as an empty page.

In your case you should insert the target bookmark at the beginning of the appended document. Please modify your code like the following:

foreach (var policy in policiesWithDocuments)
{
    var bookmarkName = $"ComplyVision__Policy_{policy.Id}";
    policiesByBookmark.Add(bookmarkName, policy.Name);

    using var vs = await _documentService.DownloadPolicyDocumentAsStreamAsync(policy.Document!.Id, false);

    Document policyDoc = new Document(vs.Stream);
    policyDoc.FirstSection.HeadersFooters.LinkToPrevious(false);
    policyDoc.FirstSection.HeadersFooters.LinkToPrevious(false);
    policyDoc.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;

    DocumentBuilder policyBuilder = new DocumentBuilder(policyDoc);
    policyBuilder.StartBookmark(bookmarkName);
    policyBuilder.EndBookmark(bookmarkName);

    mergedDoc.AppendDocument(policyDoc, ImportFormatMode.KeepDifferentStyles, importFormatOptions);
}

This will not work because { PAGE } field in the document’s header or footer has different value depending on page number. If replace { PAGE } field with simple text, the same value will be shown on each page where the appropriate header or footer is displayed.

1 Like

Thanks, that fixed the bookmarks!

I still have the problem that when I save it as a PDF the whole table of contents is missing. I attached an example PDF to my previous message.

I tried debugging it by adding text before and after the hyperlinks, but although the “before” text appears the “after” text doesn’t - I have no idea what’s going on.

// build the manual TOC using the inserted bookmakrs.
var collector = new LayoutCollector(mergedDoc);
mergedDocBuilder.MoveToBookmark(tocBookmark.Name);
mergedDocBuilder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Toc1;
mergedDocBuilder.Write("before");

// Configure tab stop to show page numbers at the right like in real TOC.
mergedDocBuilder.ParagraphFormat.TabStops.Clear();
var tabPosition = mergedDocBuilder.PageSetup.PageWidth - mergedDocBuilder.PageSetup.LeftMargin - mergedDocBuilder.PageSetup.RightMargin - 10;
mergedDocBuilder.ParagraphFormat.TabStops.Add(tabPosition, TabAlignment.Right, TabLeader.Dots);

foreach (var (bookmarkName, policyName) in policiesByBookmark) {
	// Determine absolute page number where bookmark starts.
	var pageIndex = collector.GetStartPageIndex(mergedDoc.Range.Bookmarks[bookmarkName].BookmarkStart);

	// insert hyperlink to the bookmark.
	mergedDocBuilder.InsertHyperlink($"{policyName}\t{pageIndex}", bookmarkName, true);
	mergedDocBuilder.Writeln();
}

mergedDocBuilder.Write("after");