How to Remove Different Types of Header and Footer from Document | Extract Document Pages using .NET

Hi,

I am trying to extract each page as a separate document which works perfectly fine. The document contains multiple sections and some of the sections contains different first page header. The extracted page contains the correct header too. However, the different first page header is still present in the extracted document in the background. so, if I search for the text (which is present in the different first page header) in word application, I can still see the text.

Please see the attached word file. It has 3 sections. The first one and the third one has different first page header. I am using the below code to extract each page as separate document:

static void Main()
{
  //SetLicense();

  var inFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "6.docx");
  var outFolder = Path.Combine( AppDomain.CurrentDomain.BaseDirectory, "out");
  Directory.CreateDirectory(outFolder);
  Document doc = new Document(inFilePath);
  int pageCount = doc.PageCount;
  for (int i = 1; i <= pageCount; i++)
  {
    Console.WriteLine("Extracting page {0} of {1} ...", i, pageCount);
    Document pageDoc = doc.ExtractPages(i - 1, 1);
    pageDoc.Save(Path.Combine(outFolder, $"{i}.docx"));
  }
  Console.WriteLine("Completed.");
}

The documents are created successfully. If you open the 2.docx in word application and search for “different”, you can see that that content is still exists, even though it is not visible in front end.
image.png (6.2 KB)

So, my requirement is I would like to remove the header or footer parts which are not used in that document. I would like to get rid of different first page header content from pages which are not used in that document. So that it will not be visible in search results. That is …

  • would like to remove the different first page header content from 2.docx and 3.docx.
  • would like to remove the regular header content from 1.docx (as it shows only different page header in page 1)
  • … similarly for other sections …

Sample document:
6.docx (18.6 KB)

@Sri79

Please use the following code example to remove the desired header/footer from the output document. Hope this helps you.

Document doc = new Document(MyDir + "6.docx");
int pageCount = doc.PageCount;
for (int i = 1; i <= pageCount; i++)
{
    Console.WriteLine("Extracting page {0} of {1} ...", i, pageCount);
    Document pageDoc = doc.ExtractPages(i - 1, 1);

    if (pageDoc.FirstSection.PageSetup.DifferentFirstPageHeaderFooter)
    {
        foreach (HeaderFooter headerFooter in pageDoc.FirstSection.HeadersFooters)
        {
            if (!(headerFooter.HeaderFooterType == HeaderFooterType.HeaderFirst
                || headerFooter.HeaderFooterType == HeaderFooterType.FooterFirst))
            {
                headerFooter.Remove();
            }
        }
    }
    else
    {
        foreach (HeaderFooter headerFooter in pageDoc.FirstSection.HeadersFooters)
        {
            if (headerFooter.HeaderFooterType == HeaderFooterType.HeaderFirst
                || headerFooter.HeaderFooterType == HeaderFooterType.FooterFirst)
            {
                headerFooter.Remove();
            }
        }
    }
    pageDoc.Save(MyDir + "output" + i + ".docx");
}

Thanks for providing the coding samples. Similar to above how we can handle Odd/Even header footers.

Here is the sample file:
8.docx (18.0 KB)

I have extracted each page as separate document using the code given in my post. Each extracted document has property pageDoc.FirstSection.PageSetup.OddAndEvenPagesHeaderFooter as true

1.docx contains Odd page Header which is HeaderPrimary. Here I need to remove the header HeaderEven and HeaderFirst.
2.docx contains Even page header which is HeaderEven. Here I need to remove the header HeaderPrimary and HeaderFirst.
etc.

We can remove the HeaderFirst based on the DifferentFirstPageHeaderFooter property. But not sure how to identify whether HeaderEven or HeaderPrimary used in a document.

@Sri79

You can use the same approach shared in my previous post to remove the header/footers. Please use the following code example to remove the desired header/footer. Hope this helps you.

Document doc = new Document(MyDir + "8.docx");
int pageCount = doc.PageCount;
for (int i = 1; i <= pageCount; i++)
{
    Console.WriteLine("Extracting page {0} of {1} ...", i, pageCount);
    Document pageDoc = doc.ExtractPages(i - 1, 1);

    if (pageDoc.FirstSection.PageSetup.DifferentFirstPageHeaderFooter)
    {
        foreach (HeaderFooter headerFooter in pageDoc.FirstSection.HeadersFooters)
        {
            if (!(headerFooter.HeaderFooterType == HeaderFooterType.HeaderFirst
                || headerFooter.HeaderFooterType == HeaderFooterType.FooterFirst))
            {
                headerFooter.Remove();
            }
        }
    }
    else if (pageDoc.FirstSection.PageSetup.OddAndEvenPagesHeaderFooter)
    {
        HeaderFooter hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.HeaderFirst];
        if (hf != null)
            hf.Remove();
        hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.FooterFirst];
        if (hf != null)
            hf.Remove();

        if (i % 2 == 0)
        {
            hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary];
            if (hf != null)
                hf.Remove();
            hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary];
            if (hf != null)
                hf.Remove();
        }
        else if (i % 2 == 1)
        {
            hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.HeaderEven];
            if (hf != null)
                hf.Remove();
            hf = pageDoc.FirstSection.HeadersFooters[HeaderFooterType.FooterEven];
            if (hf != null)
                hf.Remove();
        }

    }
    pageDoc.Save(MyDir + "output" + i + ".docx");
}