Appending one word document to another inserts additional pages or breaks

I am appending a cover page to a main document, each individual document has no extra pages or breaks. When I append the first document to the second document there are extra pages or breaks.

Document srcDoc = new Document("1.docx");

Document dstDoc = new Document("2.docx");

dstDoc.AppendDocument(srcDoc, ImportFormatMode.KeepSourceFormatting);
dstDoc.Save(ArtifactsDir + "Document.AppendDocument.docx");

document.zip (5.7 MB)

@smooney1234 Could you please attach your source documents here for testing? We will check the issue and provide you more information. In the attached ZIP file I see only the output document produced by 19.5 version of Aspose.Words.

Is there a way for me to remove added pages?

These should be close to the originals. I cannot get them until Monday but you could test them out in the meantime?

original.zip (5.7 MB)

@smooney1234 Thank you for additional information. The problem is not reproducible with the attached source document and the latest 22.6 version of Aspose.Words. I will wait for your original files.
Could you please also try with the latest version of Aspose.Words on your side?

Yes will do.

1 Like

Here are the original documents, are you able to try with these please?

original documents.zip (5.7 MB)

@smooney1234 Thank you for additional information. I have managed to reproduce the problem on my side. But it looks like this is not a bug in Aspose.Words, but an issue in the COPY_UMEM22382928.doc document. If simply open/save this document in MS Word the same empty pages are added in the document. It looks like the document has some issues, which are silently resolved by Aspose.Words and MS Word, but this adds page breaks.

how do i stop this? do you have a code snippet?

how do i make the inserted images smaller?

Aspose.Pdf.Document document;
byte[] pdfBytes = File.ReadAllBytes(memorial);
var localDocument = "";
using (var stream = new MemoryStream(pdfBytes))
{
    document = new Aspose.Pdf.Document(stream, false);
    newFile = Path.ChangeExtension(memorial, "docx");
    localDocument = CommonUtils.AddPath(tempDir, Path.GetFileName(newFile));


    newFile = localDocument;

    document.Save(localDocument, Aspose.Pdf.SaveFormat.DocX);
}

I thought I could trace the issue to append document

@smooney1234 Do you mean that the documents you have attached was originally produced by Aspose.PDF by conversion from PDF to DOC? If you use .NET Framework 4.6.1 or newer, .NET Core 2.0 or newer, or .NET5 or newer, you can load PDF documents directly into Aspose.Words.Document object.
https://docs.aspose.com/words/net/convert-pdf-to-other-document-formats/
Could you please attach your source PDF document here, so we can test with the original document.
Regarding Aspose.PDF, you should ask in the appropriate Aspose.PDF support forum.

No - I mean how to i change the size of the images on each page and save as a word document

@smooney1234 As I can see in the code snippet you have provided you use Aspose.PDF to convert source PDF to DOCX. Please elaborate your scenario.

Yes @alexey.noskov so I save from pdf to word. Once the document is in word, how to i resize the images so the page breaks go away.

@smooney1234 The size of image is not the problem. The redundant page break is generated by a redundant page break inserted at the end of section break, which also generates a page break:

To resolve the problem you can remove this redundant page break at the end of section:

Document doc = new Document(@"C:\Temp\COPY_UMEM22382928.doc");
NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
foreach (Run r in runs)
{
    // get next paragraph
    Paragraph nextPara = r.ParentParagraph.NextSibling as Paragraph;

    if ((r.Text == ControlChar.PageBreak) &&
        (nextPara != null) &&
        nextPara.IsEndOfSection &&
        (r == r.ParentParagraph.LastChild))
    {
        r.Remove();
    }
}

doc.Save(@"C:\Temp\out.doc");

Have you tested this? When i run it, the r.Remove() never gets hit

@smooney1234 Sure, I have tested the code with your COPY_UMEM22382928.doc document and the code removes 6 redundant page breaks from the document. After saving the document does not have redundant empty pages.

Are you able to test with the final output?

ROD_COPY_RETURN22382806.zip (5.7 MB)

@smooney1234 The same result - 6 redundant page breaks are removed. The output document does not have empty pages.

Can you send me the result of copy umem and rod copy umem?. I have tested it with both and the page breaks are still there