Merge documents leads to extra page

Hi,
I am working on merge documents in aspose. i have two documents each has one page. I am merging documents using attached code. But when i merge the documents it generates 3 pages. Why is it happening. Can you please look into attached application and give me some solution.

Thanks.Sample document for Aspose.zip (43.5 KB)

@smartlayer,

Since there are no headers and footers in b.doc, the problem occurs because specifying false to HeadersFooters.LinkToPrevious method causes creation of all types (header/footer first, primary and even) of blank headers footers in final document which causes extra vertical spacing. The following sample code demonstrates a workaround:

Document doca = new Document(MyDir + @"a.doc");
Document docb = new Document(MyDir + @"b.doc");

docb.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
docb.FirstSection.PageSetup.RestartPageNumbering = true;
docb.FirstSection.HeadersFooters.LinkToPrevious(false);

foreach(Section sec in docb.Sections)
{
    foreach(HeaderFooter hf in sec.HeadersFooters)
    {
        if (hf.ChildNodes.Count == 1 && 
            hf.FirstChild.NodeType == NodeType.Paragraph &&
            hf.ToString(SaveFormat.Text).Trim().Equals(string.Empty))
        {
            hf.Remove();
        }
    }
}

doca.AppendDocument(docb, ImportFormatMode.KeepDifferentStyles);

doca.Save(MyDir + @"17.8.doc");
doca.Save(MyDir + @"17.8.pdf");

Hope, this helps.

Best regards,

Hi,

I checked you code. Even though it is removing extra space of header and footer. But there is one more problem occurred. Like we have 3 documents to merge and last one has header in it. It is removing that header too. It should not remove the header of third document.

The modified version of your code i am using is below:

DirectoryInfo dirinfo = new DirectoryInfo(resultpath);
FileInfo[] files = dirinfo.GetFiles().OrderBy(p => p.CreationTime).ToArray();
if (files.Count() == 1)
{
Document pdfdoc = new Document(files[0].FullName);
pdfdoc.Save(resultpath + savefilename + “.doc”);
pdfdoc.Save(resultpath + savefilename + “.pdf”);
}
else
{
Document docall = new Document(files[0].FullName);
docall.Save(resultpath + savefilename + “.doc”);
for (int i = 1; i < files.Count(); i++)
{
Document docall_final = new Document(resultpath + savefilename + “.doc”);
Document doc_current = new Document(files[i].FullName);

					doc_current.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
					doc_current.FirstSection.PageSetup.RestartPageNumbering = true;
					doc_current.FirstSection.HeadersFooters.LinkToPrevious(false);
					//doc_current.UpdatePageLayout();

					foreach (Section sec in doc_current.Sections)
					{
						foreach (HeaderFooter hf in sec.HeadersFooters)
						{
							if (hf.ChildNodes.Count == 1 &&
								hf.FirstChild.NodeType == NodeType.Paragraph &&
								hf.ToString(SaveFormat.Text).Trim().Equals(string.Empty))
							{
								hf.Remove();
							}
						}

					}

					docall_final.AppendDocument(doc_current, ImportFormatMode.KeepDifferentStyles);
		<a class="attachment" href="/uploads/default/5190">b1.zip</a> (17.4 KB)
		docall_final.Save(resultpath + savefilename + ".doc");
					docall_final.Save(resultpath + savefilename + ".pdf");
				}
			}

I am also attaching 3rd document which contains header part. you can test with this.

@smartlayer,

I am afraid, I do not see any new attachments in this thread. Please ZIP and attach new input Word documents and Aspose.Words generated output document showing the undesired behavior here for testing. We will investigate the issue further on our end and provide you more information.

Best regards,

Hi ,
i am attaching all documents which contains a.doc,b.doc and b1.doc. and the merged document named “MergedFiles.doc”. Please check.mergeDocs.zip (42.9 KB)

@smartlayer,

The following code should fix this issue:

string resultpath = MyDir + @"mergeDocs\";
string savefilename = @"mergeDocs\17.8";           

DirectoryInfo dirinfo = new DirectoryInfo(resultpath);
FileInfo[] files = dirinfo.GetFiles().OrderBy(p => p.CreationTime).ToArray();
if (files.Count() == 1)
{
    Document pdfdoc = new Document(files[0].FullName);
    pdfdoc.Save(resultpath + savefilename + ".doc");
    pdfdoc.Save(resultpath + savefilename + ".pdf");
}
else
{
    Document docall = new Document(files[0].FullName);
    docall.Save(resultpath + savefilename + ".doc");
    for (int i = 1; i < files.Count(); i++)
    {
        Document docall_final = new Document(resultpath + savefilename + ".doc");
        Document doc_current = new Document(files[i].FullName);

        doc_current.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
        doc_current.FirstSection.PageSetup.RestartPageNumbering = true;
        doc_current.FirstSection.HeadersFooters.LinkToPrevious(false);
        //doc_current.UpdatePageLayout();

        foreach (Section sec in doc_current.Sections)
        {
            foreach (HeaderFooter hf in sec.HeadersFooters)
            {
                if (hf.GetChildNodes(NodeType.Any, true).Count == 1 &&
                    hf.FirstChild.NodeType == NodeType.Paragraph &&
                    hf.ToString(SaveFormat.Text).Trim().Equals(string.Empty))
                {
                    hf.Remove();
                }
            }

        }

        docall_final.AppendDocument(doc_current, ImportFormatMode.KeepDifferentStyles);

        docall_final.Save(resultpath + savefilename + ".doc");
        docall_final.Save(resultpath + savefilename + ".pdf");
    }
}

Best regards,

Hi ,

I tried your solution but it leads to more issues. Please check this.
i have 5 documents.
“c.doc” has no header, but when i merge it adds header to this document and also add the page number in header to last document(d.doc) which should not be. Please check the attached project. it contains your code and all files.

Thanks
Sample document for Aspose.zip (78.9 KB)

@smartlayer,

MS Word 2016 reports that there are two pages in c.doc. The c.doc document actually has a “primary header” containing PAGE field which is only visible on second page. This PAGE field (header) is not visible on first page because the header/footer option “Different First Page” is enabled (try disabling it). You can also use PageSetup.DifferentFirstPageHeaderFooter property to disable it programmatically.

Best regards,

Hi @awais.hafeez,
But i need header on second page not on first page.

Thanks

@smartlayer,

Please attach your expected Word document here for our reference. We will investigate the structure of your expected document as to how you want your final output be generated like. You can create expected document by using Microsoft Word. We will then provide you code to achieve the same by using ‘Aspose.Words for .NET’. Thanks for your cooperation.

Best regards,

@awais.hafeez,

As per your request, i am sending you the documents. The merged documents ( a.doc,b.doc,c.doc,d.doc,e.doc). The Expected output document (Expected output.doc) and the output i am getting ( Received Output.doc).

data.zip (109.7 KB)

@smartlayer,

I have added a few checks as per your requirement:

string resultpath = MyDir + @"data\data\Documents to Merge\";
string savefilename = @"mergeDocs\17.8";

DirectoryInfo dirinfo = new DirectoryInfo(resultpath);
FileInfo[] files = dirinfo.GetFiles().OrderBy(p => p.CreationTime).ToArray();
if (files.Count() == 1)
{
    Document pdfdoc = new Document(files[0].FullName);
    pdfdoc.Save(resultpath + savefilename + ".doc");
    pdfdoc.Save(resultpath + savefilename + ".pdf");
}
else
{
    Document docall = new Document(files[0].FullName);
    docall.Save(resultpath + savefilename + ".doc");
    for (int i = 1; i < files.Count(); i++)
    {
        Document docall_final = new Document(resultpath + savefilename + ".doc");
        Document doc_current = new Document(files[i].FullName);

        doc_current.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
        doc_current.FirstSection.PageSetup.RestartPageNumbering = true;
        doc_current.FirstSection.HeadersFooters.LinkToPrevious(false);                    

        foreach (Section sec in doc_current.Sections)
        {
            foreach (HeaderFooter hf in sec.HeadersFooters)
            {
                if (hf.GetChildNodes(NodeType.Any, true).Count == 1 &&
                    hf.FirstChild.NodeType == NodeType.Paragraph &&
                    hf.ToString(SaveFormat.Text).Trim().Equals(string.Empty))
                {
                    // don't remove HeaderFirst if PageSetup.DifferentFirstPageHeaderFooter is enabled
                    if (hf.HeaderFooterType == HeaderFooterType.HeaderFirst)
                    {
                        if (sec.PageSetup.DifferentFirstPageHeaderFooter == true)
                            continue;
                    }

                    // don't remove all HeaderPrimary
                    if (hf.HeaderFooterType == HeaderFooterType.HeaderPrimary)
                    {
                        continue;
                    }
                        hf.Remove();
                }
            }
        }

        docall_final.AppendDocument(doc_current, ImportFormatMode.KeepDifferentStyles);

        docall_final.Save(resultpath + savefilename + ".doc");
        docall_final.Save(resultpath + savefilename + ".pdf");
    }
}

Additionally, to remove extra vertical spacing caused by Headers/Footers, you may also want to adjust Header distance from Top and Footer distance from Bottom by using PageSetup.HeaderDistance and PageSetup.FooterDistance properties.

Hope, this helps.

Best regards,

Hi @awais.hafeez,

I am still facing lot of issues. Is there any way we get to know that image is present in Header or not?

Thanks

@smartlayer,

Please try using the following code:

Document doc = new Document(MyDir + @"b.doc");

for (int i = 0; i < doc.Sections.Count; i++)
{
    Section sec = doc.Sections[i];
    foreach (HeaderFooter hf in sec.HeadersFooters)
    {
        // find all Shape nodes in headers/footers
        NodeCollection col = hf.GetChildNodes(NodeType.Shape, true);
        if (col.Count > 0)
        {
            Console.WriteLine(hf.HeaderFooterType.ToString() + " of Section " + i + " has " + col.Count.ToString() + " Shape objects");
        }
    }
}

Best regards,