I am trying to understand what you are trying to accomplish in the above code. First you are trying to find paragraphs which span over a page boundary. Then you look for empty paragraphs which immediately precede that one and remove it?
I'm not sure if this guarantees that empty pages will be removed. This solution would only solve the case of inserting a page break at the very end of a page (which will create 2 pages), but not every blank page in the document (which the original question asked if it were possible).
I have an idea of using the classes within your PageSplitter demo, where the document is split into individual pages and content is checked if that page is empty. If the page is non-empty, then the "document" is appended to the final output. Assume necessary variables, usings, and namespaces included
// Create and attach collector
LayoutCollector coll = new LayoutCollector(doc);
// Split nodes in the document into separate pages.
PageSplitter.DocumentPageSplitter dps = new PageSplitter.DocumentPageSplitter(coll);
// Initialize empty document
Document outDoc = new Document(dataDir + "Template.docx");
outDoc.RemoveAllChildren();
for (int i = 1; i <= doc.PageCount; i++)
{
// Grab page's text and footer
Document pageDoc = dps.GetDocumentOfPage(i);
string pageText = pageDoc.GetText();
string footerText = pageDoc.GetChildNodes(NodeType.HeaderFooter, true)[1].GetText();
// Header+Footer is stored before content -> skip all that to get to body
string bodyContent = pageText.Substring(pageText.IndexOf(footerText) + footerText.Length);
// If body contains all white space then skip. Else append doc.
if (!System.Text.RegularExpressions.Regex.IsMatch(bodyContent, @"^\s*$"))
{
outDoc.AppendDocument(pageDoc, ImportFormatMode.KeepSourceFormatting);
}
}
outDoc.Save(dataDir + "FinalDraft.docx");
There are 2 issues I see with my code:
1) The new document will create a new Section for every page. In my cases, this was not an issue because MailMerge was already executed, and I did not change formatting per section.
2) When checking if a page is empty, I am only checking the text. There may be pages which contain objects without text (such as images), and this page will be left out of the final document.
Is there a better way to check if a page is empty? I have played with the GetChildNodes() method, but it is difficult to separate the body content from the headers and footers.
Thanks
Toan Tran