Code to Remove Empty Pages from Word Document using C# .NET

Hi Team,
I have a document with Empty Pages. that document having more then 100 page
we might have multiple blank pages in between of document. Is there any way to identify these blank pages and then remove it using aspose.words.

Below the sample coding using its working but taking more time ,
1.Any other way to remove empty page from document
2.And Any other way to Reduce time for removing empty page from document or how to improve performance below the code without looping.

Note: How to remove empty page without looping because my document having more then 100 pages

we are using sample coding:
Aspose.Words.License license1 = new Aspose.Words.License();
license1.SetLicense(“Aspose.Total.lic”);

            Aspose.Words.Document doc1 = new Aspose.Words.Document(@"D:\\FileTest\DRL1.docx");
            foreach (Aspose.Words.Section section in doc1.Sections)
            {
                if (section.ToString(Aspose.Words.SaveFormat.Text).Trim() == String.Empty)
                    section.Remove();
            }
            String PageText = "";
            Aspose.Words.Layout.LayoutCollector lc = new Aspose.Words.Layout.LayoutCollector(doc1);
            int pages = lc.GetStartPageIndex(doc1.LastSection.Body.LastParagraph);
            for (int i = 1; i <= pages; i++)
            {
                ArrayList nodes = GetNodesByPage(i, doc1);
                foreach (Aspose.Words.Paragraph para in nodes)
                {
                    PageText += para.ToString(Aspose.Words.SaveFormat.Text).Trim();
                }
                //Empty Page
                if (PageText == "")
                {
                    foreach (Aspose.Words.Node node in nodes)
                    {
                        node.Remove();
                    }
                }
                nodes.Clear();
                PageText = "";
            }
            doc1.Save(@"D:\\FileTest\DRLOut1.docx");
  private static ArrayList GetNodesByPage(int page, Aspose.Words.Document document)
    {
        ArrayList nodes = new ArrayList();
        Aspose.Words.Layout.LayoutCollector lc = new Aspose.Words.Layout.LayoutCollector(document);
        foreach (Aspose.Words.Paragraph para in document.GetChildNodes(Aspose.Words.NodeType.Paragraph, true))
        {
            if (lc.GetStartPageIndex(para) == page || para.IsEndOfSection)

                nodes.Add(para);
        }

        return nodes;
    }

Sample document:
sample Word doc.zip (106.7 KB)

@thiru1711,

Alternatively, you can try implementing the following workflow:

  • Convert Word document to a temporary PDF in memory by using Aspose.Words for .NET
  • Use Aspose.PDF for .NET API to detect which page numbers are blank/empty in this PDF (see Page.IsBlank Method and PageCollection Class)
  • Use Aspose.Words to remove only those nodes which are present on such blank pages detected in previous step.

Hope, this helps.

A post was split to a new topic: Remove empty pages from Word document

Thanks for replay.
we are tried to remove empty page using Aspose.pdf, due to some format issue occurred.

Any other way to remove empty page from document using Aspose.Word
or
How to execute macro coding using Aspose

@thiru1711,

You only need to detect empty pages by using Aspose.PDF API. But, after that you need to remove pages from Word document by using Aspose.Words. Suppose that by using the Aspose.PDF API you have found that 2, 5, 8, 11, 114, and 117 page numbers are blank or empty. After that you can use the following Aspose.Words code to remove Nodes from these pages of Word document:

Document doc = new Document("E:\\Temp\\sample Word doc\\Sample DOc.docx");
LayoutCollector layoutCollector = new LayoutCollector(doc);

ArrayList list = new ArrayList();
foreach (Node node in doc.GetChildNodes(NodeType.Any, true))
{
    if (layoutCollector.GetNumPagesSpanned(node) == 0)
    {
        int pageIndex = layoutCollector.GetStartPageIndex(node);
        if (pageIndex == 2 || pageIndex == 5 || pageIndex == 8 || pageIndex == 11 || pageIndex == 114 || pageIndex == 117)
        {
            list.Add(node);
        }
    }
}

foreach (Node node in list)
    node.Remove();

doc.Save("E:\\Temp\\sample Word doc\\20.6.docx");

Unfortunately, there is no way to execute or run macro codes in Word document by using Aspose.Words API.

hi
without using Aspose.Pdf API ,i need to Remove empty page for document using Aspose . word,

Any other way to remove empty page from document using Aspose.Word
Any other way available in Aspose.Word

@thiru1711,

We will investigate the possibility of providing some method(s) in Aspose.Words API to remove empty pages from Word document. We have logged above requirement in our issue tracking system. Your ticket number is WORDSNET-20633. We will further look into the details of this requirement and will keep you updated on the status of the linked issue.

@thiru1711,

Regarding WORDSNET-20633, we have completed the work on this issue and concluded to close this issue without adding any such functionality in Aspose.Words API. Please see the following analysis details.

Aspose.Words works with flow format documents and there is no such concept as “Page” in the document model. The page only appears in the rendered output, e.g. PDF or XPS, etc. Since there is no such Page object so there is nothing we can implement in Aspose.Words API to meet this requirement.

You would have to analyze the output and then try to manipulate the document, but this would be very unreliable logic since there could be many factors affecting new blank pages appearing in the output and those factors are not only related to document content itself. For example, section properties may dictate that it starts on odd or even page and hence will create blank page on demand.

If we can help you with anything else, please feel free to ask.

hi @awais.hafeez,
I have a document with Empty Pages.
we might have multiple blank pages in the document. Is there any way to identify these blank pages and then remove it using aspose.words.

Below the sample coding using not working please find attachment doc

Sample Document:Sample doc.zip (57.7 KB)

we are using sample coding:

Aspose.Words.License license1 = new Aspose.Words.License();
license1.SetLicense("Aspose.Total.lic");

Aspose.Words.Document doc1 = new Aspose.Words.Document(@"D:\\FileTest\DRL1.docx");
foreach (Aspose.Words.Section section in doc1.Sections)
{
    if (section.ToString(Aspose.Words.SaveFormat.Text).Trim() == String.Empty)
        section.Remove();
}
String PageText = "";
Aspose.Words.Layout.LayoutCollector lc = new Aspose.Words.Layout.LayoutCollector(doc1);
int pages = lc.GetStartPageIndex(doc1.LastSection.Body.LastParagraph);
for (int i = 1; i <= pages; i++)
{
    ArrayList nodes = GetNodesByPage(i, doc1);
    foreach (Aspose.Words.Paragraph para in nodes)
    {
        PageText += para.ToString(Aspose.Words.SaveFormat.Text).Trim();
    }
    //Empty Page
    if (PageText == "")
    {
        foreach (Aspose.Words.Node node in nodes)
        {
            node.Remove();
        }
    }
    nodes.Clear();
    PageText = "";
}
doc1.Save(@"D:\\FileTest\DRLOut1.docx");

private static ArrayList GetNodesByPage(int page, Aspose.Words.Document document)
{
    ArrayList nodes = new ArrayList();
    Aspose.Words.Layout.LayoutCollector lc = new Aspose.Words.Layout.LayoutCollector(document);
    foreach (Aspose.Words.Paragraph para in document.GetChildNodes(Aspose.Words.NodeType.Paragraph, true))
    {
        if (lc.GetStartPageIndex(para) == page || para.IsEndOfSection)

            nodes.Add(para);
    }

    return nodes;
}

Note: How to remove empty page in document

@thiru1711,

MS Word 2019 reports that there are 16 pages in your “TestEmpty1.docx” Word document. But, when we convert it to PDF by using latest 20.7 version of Aspose.Words for .NET, it produces 15 pages in PDF (see 20.7.pdf (160.4 KB)). The LayoutCollector class belongs to the same Layout engine that Aspose.Words uses to render Word documents to PDF format. So, first we need to fix the problem with PDF and then you should be able to use the same code from my previous post to remove selected Pages.

We have logged the following issues in our bug tracking system.

WORDSNET-20829: Table Rows being pushed to previous Pages in PDF
WORDSNET-20830: Preserve empty page during Word DOCX to PDF Conversion

We will further look into the details of these problems and will keep you updated on the status of corrections. We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-20829,WORDSNET-20830) have been fixed in this Aspose.Words for .NET 22.3 update also available on NuGet.