Want to start a heading section from start of new page ( using pagebreak? but it may create a blank page)

Hi,

Current version we are using Aspose.Words - 13.3 .0 .0

We want a heading section to start from start of next page. Currently we have bookmarked that section in a words template document and we directly insert content on that bookmarked place.
Now issue is that if we put a pagebreak using document builder at the bookmarked place, it may happen that last heading section just ended at the end of last page. So page break at bookmarked place will create uncessary blank page. We want to avaoid that. How can we solve this? I heard that there is a LayoutCollector class. Can we make use of this class to ascertain if we don’t create blank page. Any sample code or idea will be highly appreaciated.
many thanks,
Chandra

Hi Chandra,

Thanks for your inquiry. Could you please attach your input template document and the expected output document (you can create this document using Microsoft Word) here for testing? I will investigate your scenario on my side and provide you code to achieve what you’re looking for.

Best regards,

Hi,
We are using Aspose.Words.dll (v13.4.0.0).
Please find attached input template document and expected output document along with actual output document we are getting. Basically we are getting a blank page and we want our cost estimate section to start at top in next page after the page that contains table/image. We need to find a way such that blank pages, that may come anywhere in proposal document, can be removed by running a method/routine on whole document after mail merge.
The following bookmarks have been defined in template document with the purpose of each bookmark is mentioned below:
JobBM - need this bookmark to reorder Job section
JobInformationBM - need this bookmark to reorder Job information section
Schematic - need this bookmark to insert schematic table data at bookmarked place
ImageBelow - need this bookmark to to insert image below table at bookmarked place (if image not to be displayed within table)
CostEstimateBM - need this bookmark to reorder cost estimate section [We need to preserve this bookmark in generated document for later processing]
QuotesInsertionPlace - need this bookmark to insert cost estimate table at bookmarked place
RollupCostEstimateBM - need this bookmark to reorder rollup cost estimate section [We need to preserve this bookmark in generated document for later processing]
RollupQuotesInsertionPlace - need this bookmark to insert rollup cost estimate table at bookmarked place
Please review and let us know if you need more information.
Thanks and regards,
Chandra

Hi Chandra,

Thanks for the additional information. I believe you can achieve this after using the code suggested below:

Document doc = new Document(@"C:\Temp\SampleDraft▒+actual.docx");
LayoutCollector collector = new LayoutCollector(doc);
ArrayList pageBreakParagraphs = new ArrayList();
Node[] paragraphs = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
foreach(Paragraph para in paragraphs)
foreach(Run run in para.Runs)
if (run.Text.Contains(ControlChar.PageBreak))
    pageBreakParagraphs.Add(para);
bool isNonEmpty = true;
ArrayList paragraphsToRemove = new ArrayList();
foreach(Node para in pageBreakParagraphs)
{
    isNonEmpty = true;
    int pageNumber = collector.GetEndPageIndex(para);
    Node tempPara = para.PreviousSibling;
    while (collector.GetEndPageIndex(tempPara) == pageNumber)
    {
        if (!tempPara.Range.Text.Equals(ControlChar.Cr) ||
            tempPara.NodeType != NodeType.Paragraph)
        {
            isNonEmpty = false;
            break;
        }
        paragraphsToRemove.Add(tempPara);
        tempPara = tempPara.PreviousSibling;
    }
    if (isNonEmpty)
    {
        foreach(Paragraph emptyPara in paragraphsToRemove)
        emptyPara.Remove();
        para.Remove();
    }
    paragraphsToRemove = new ArrayList();
}
doc.Save(@"C:\Temp\out.docx");

PS: You can use this code after MailMerge.Execute method is called.

Best regards,

I am trying to understand what you are trying to accomplish in the above code. First you are trying to find paragraphs which span over a page boundary. Then you look for empty paragraphs which immediately precede that one and remove it?
I’m not sure if this guarantees that empty pages will be removed. This solution would only solve the case of inserting a page break at the very end of a page (which will create 2 pages), but not every blank page in the document (which the original question asked if it were possible).
I have an idea of using the classes within your PageSplitter demo, where the document is split into individual pages and content is checked if that page is empty. If the page is non-empty, then the “document” is appended to the final output. Assume necessary variables, usings, and namespaces included

// Create and attach collector
LayoutCollector coll = new LayoutCollector(doc);
// Split nodes in the document into separate pages.
PageSplitter.DocumentPageSplitter dps = new PageSplitter.DocumentPageSplitter(coll);
// Initialize empty document
Document outDoc = new Document(dataDir + "Template.docx");
outDoc.RemoveAllChildren();
for (int i = 1; i <= doc.PageCount; i++)
{
    // Grab page's text and footer
    Document pageDoc = dps.GetDocumentOfPage(i);
    string pageText = pageDoc.GetText();
    string footerText = pageDoc.GetChildNodes(NodeType.HeaderFooter, true)[1].GetText();
    // Header+Footer is stored before content -> skip all that to get to body
    string bodyContent = pageText.Substring(pageText.IndexOf(footerText) + footerText.Length);
    // If body contains all white space then skip. Else append doc.
    if (!System.Text.RegularExpressions.Regex.IsMatch(bodyContent, @"^\s*$"))
    {
        outDoc.AppendDocument(pageDoc, ImportFormatMode.KeepSourceFormatting);
    }
}
outDoc.Save(dataDir + "FinalDraft.docx");

There are 2 issues I see with my code:

  1. The new document will create a new Section for every page. In my cases, this was not an issue because MailMerge was already executed, and I did not change formatting per section.
  2. When checking if a page is empty, I am only checking the text. There may be pages which contain objects without text (such as images), and this page will be left out of the final document.

Is there a better way to check if a page is empty? I have played with the GetChildNodes() method, but it is difficult to separate the body content from the headers and footers.
Thanks
Toan Tran

I discovered the Body class and have refactored my code to get the text. To search for images, I called GetChildNodes(NodeType.Shape). The API documentation says that Aspose.Words.Drawing.Shape is used for “AutoShape, textbox, freeform, OLE object, ActiveX control, or picture,” and I don’t think I use any of those other objects, so this seems to work for me. My modified code is below

{
    // Grab page's body
    Document pageDoc = dps.GetDocumentOfPage(i);
    Body body = (Body)pageDoc.GetChildNodes(NodeType.Body, true)[0];
    // If body contains shapes or text that isn't white space -> append doc.

    if (!System.Text.RegularExpressions.Regex.IsMatch(body.GetText(), @"^\s*$") || body.GetChildNodes(NodeType.Shape, true).Count > 0)
    {
        outDoc.AppendDocument(pageDoc, ImportFormatMode.KeepSourceFormatting);
    }
}

Thanks
Toan Tran

Hi Chandra,

Thanks for the additional information and sorry for the delayed response. Regarding the code in my previous post, yes, it does exactly the same as you’ve explained. Secondly, please take a look at the code which I supplied to another user here; this code splits the content of a document spanning accross multiple pages into separate Sections and of course you can use GetChildNodes(NodeType.Any, true) method to determine whether a Section contains any Nodes or not prior appending this Section to the final document. I hope, this helps.

Best regards,