We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Remove All Blank or Empty Pages from Word Document (C# .NET)

MS Word or LibreOffice/OpenOffice documents are not fixed page formats; they are flow formats more like HTML files. Examples of MS Word documents are DOCX, DOTX, RTF, DOC etc. And OpenOffice can work with these file formats ODT, OTT, XML etc. So, there is no concept of Page in MS Word document and Pages are created on the fly when you open a Word document with MS Word or OpenOffice applications. Also, there is no easy way to programmatically determine where a Page starts or ends and no easy way to determine whether some particular Page is blank or empty. One way to achieve this is to split Word document into smaller one-Page Word documents and check if those one-Page documents have content or not. Then concatenate or merge non-empty documents again into one Word document. See the following C# code of Aspose.Words for .NET API:

// Load DOCX file you want to Remove Blank Pages from
Document doc = new Document(@"C:\Temp\remove empty pages from word.docx");
 
// An Array List will hold Blank or Empty Page numbers
ArrayList empty_Page_Numbers = new ArrayList();
empty_Page_Numbers.Add(-1);
 
// Extract each Page as a separate Word document
int total_Pages = doc.PageCount;
for (int i = 0; i < total_Pages; i++)
{
    Document one_Page_Doc = doc.ExtractPages(i, 1);
 
    // Get text representation of this Page and total Count of Shapes
    int shape_Count = 0;
    string text_Of_Page = "";
    foreach (Section section in one_Page_Doc.Sections)
    {
        // Lets not consider the content of Headers and Footers
        text_Of_Page = text_Of_Page + section.Body.ToString(SaveFormat.Text);
        shape_Count += section.Body.GetChildNodes(NodeType.Shape, true).Count;
    }
 
    // if text_of_Page is Empty and does not contain any Shape nodes then consider this Page is Blank or Empty
    if (string.IsNullOrEmpty(text_Of_Page.Trim()) && shape_Count == 0)
        empty_Page_Numbers.Add(i);
}
empty_Page_Numbers.Add(total_Pages);
 
// Concatenate small one-Page Word documents with Non-Empty Pages again
Document final_Document = (Document)doc.Clone(false);
final_Document.RemoveAllChildren();
 
for (int i = 1; i < empty_Page_Numbers.Count; i++)
{
    int index = (int)empty_Page_Numbers[i - 1] + 1;
    int count = (int)empty_Page_Numbers[i] - index;
 
    if (count > 0)
        final_Document.AppendDocument(doc.ExtractPages(index, count), ImportFormatMode.KeepSourceFormatting);
}
 
final_Document.Save(@"C:\Temp\merged word Document with non-empty pages.docx");

Although above C# code ensures that there will be no empty or blank Pages in final Word document; but, this solution of figuring out which Pages are not empty, and then creating a new document by concatenating them all together, is a bit costly both in time and server resources.

Delete All Empty or Blank Pages in Word DOCX ODT Files

Another less resource consuming (and less accurate) approach is to remove explicit Page Breaks from the Word document. If you delete explicit Page Breaks from your document, this might help you to get rid of blank Pages. There are a few options that users can set to insert an explicit Page Break in Word document. For example,

C# code example for removing such Page Breaks from MS Word document is as follows:

private static void RemovePageBreaks(Document doc)
{
    // Retrieve all Paragraphs in the document.
    NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
 
    // Iterate through all Paragraphs
    foreach (Paragraph para in paragraphs)
    {
        // If the Paragraph has a Page Break Before set then clear it.
        if (para.ParagraphFormat.PageBreakBefore)
            para.ParagraphFormat.PageBreakBefore = false;
 
        // Check all Runs in the Paragraph for Page Breaks and remove them.
        foreach (Run run in para.Runs)
        {
            if (run.Text.Contains(ControlChar.PageBreak))
                run.Text = run.Text.Replace(ControlChar.PageBreak, string.Empty);
        }
    }
}

Also, it might be good enough to just remove empty Paragraph(s) from the end of the document, this might help to get rid of empty Page(s) from the end of document. Other less resource consuming measures that you can take to delete empty Pages are as follows:

  • Remove empty Paragraphs at the document end, in most cases this will avoid blank Pages at the end of the document.
  • Remove explicit Page Breaks from the end of the document.
  • If the last node of the document is Paragraph, then enable “Window/Orphan Control” option for this Paragraph. If Paragraphs size is not very big (3-6 lines), I would recommend to enable “Keep Lines Together”.
  • If the last node is Table, then enable “Keep with Next” option for Paragraphs of at least last 1-3 Rows. This is important because Table cannot be the last document node. There must be at least one Paragraph after a Table. If you remove empty Paragraph after the Table at the end of the document, an empty Paragraph will be automatically added. “Keep with Next” option set in Table will move part of the table with the empty Paragraph at the end of the document.

If you have control over the template document, you can do this using MS Word. Otherwise, you can perform programmatic processing of the document. For example, see the following C# code:

private void AvoidEmptyPagesAtDocumentEnd(Document doc)
{
    // 1. Remove empty sections if persists.
    while ((doc.Sections.Count > 0)
        && (doc.LastSection.Body.GetChildNodes(NodeType.Run, true).Count == 0)
        && (doc.LastSection.Body.GetChildNodes(NodeType.Shape, true).Count == 0))
        doc.LastSection.Remove();
 
    // 2. Remove empty paragraphs at the end of the document.
    while ((doc.LastSection.Body.LastChild.NodeType == NodeType.Paragraph) && !doc.LastSection.Body.LastParagraph.HasChildNodes)
        doc.LastSection.Body.LastParagraph.Remove();
 
    // 3. Set Window/Orphan control option for the last paragraph.
    if (doc.LastSection.Body.LastChild.NodeType == NodeType.Paragraph)
        doc.LastSection.Body.LastParagraph.ParagraphFormat.WidowControl = true;
 
    // 4. Enable Keep with next option if the last node is table.
    if (doc.LastSection.Body.LastChild.NodeType == NodeType.Table)
    {
        Table lastTable = (Table)doc.LastSection.Body.LastChild;
        NodeCollection rowParagraphs = lastTable.LastRow.GetChildNodes(NodeType.Paragraph, true);
        foreach (Paragraph para in rowParagraphs)
            para.ParagraphFormat.KeepWithNext = true;
    }
}