Fill Page with Empty Paragraphs | Start each Merged Word Document with a New Page (Java)

zqzq34 · July 1, 2021, 2:09am

hi, I want to merge several docxs to one, but I want to keep every docx starting with a page. So I must fill empty paragraph to every previous docx’s last page, to make it a complete page, like this:

builder = new DocumentBuilder(doc);
int pageCount = builder.getDocument().getPageCount();
int nowPageCount = pageCount;
int i = 0;
Paragraph paragraph = null;
while(nowPageCount == pageCount)
{
	builder.moveToDocumentEnd();
	paragraph = builder.insertParagraph();
	i++;
//	System.out.println(i);
	builder.getDocument().updatePageLayout();
	nowPageCount = builder.getDocument().getPageCount();
}
if(paragraph != null)
{
	paragraph.remove();
}

the page number count does not update automaticly, so I have to use updatePageLayout() in the loop, i find it is not effciency, is there a better way?

awais.hafeez · July 1, 2021, 2:56am

@zqzq34,

Please check if the following solution is acceptable for you?

// directory that contains DOC/DOCX files to be merged
String path = "C:\\Temp\\";
String pattern = "*.doc?";
String[] fileNames = GetFiles(path, pattern);

// We will append all DOCX files to this final Document
Document finalDocument = new Document();
finalDocument.removeAllChildren();

int i = 0;
for (String fileName : fileNames) {
    Document doc = new Document(path + fileName);
    DocumentBuilder builder = new DocumentBuilder(doc);

    // insert a temporary bookmark at the start of each DOCX file to be merged
    builder.startBookmark("bm_" + i);
    builder.endBookmark("bm_" + i);
    i++;

    // keep merging DOCX files with finalDocument
    finalDocument.appendDocument(doc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

// now to ensure that each merged document starts with a new page
// just set SectionStart to NEW_PAGE
for (Bookmark bookmark : finalDocument.getRange().getBookmarks()) {
    if (bookmark.getName().startsWith("bm_")) {
        Section section = (Section) bookmark.getBookmarkStart().getAncestor(NodeType.SECTION);
        section.getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
        // remove temporary bookmark
        bookmark.remove();
    }
}

finalDocument.save("C:\\Temp\\awjava-21.6.docx");

public static String[] GetFiles(final String path, final String searchPattern) {
    final Pattern re = Pattern.compile(searchPattern.replace("*", ".*").replace("?", ".?"));
    return new File(path).list(new FilenameFilter() {
        public boolean accept(File dir, String name) {
            return new File(dir, name).isFile() && re.matcher(name).matches();
        }
    });
}

zqzq34 · July 1, 2021, 9:00am

3.docx (25.2 KB)
4.docx (344.7 KB)
6.docx (45.4 KB)
Hi, thanks for your reply. I have test the code, it works on “simple” docxs. However, when I fed it with my actual “complex” docxs, the result docx was a damaged one, and it could not be opened.

I analyzed the source docxs, and simplify the inputs.

4.docx is a “cover” page, there is picture in it, as background.
6.docx has header and footer, and there is picture as a logo in the header.
3.docx is just a sample.

There will have a damaged result docx when I merge 4.docx with 3.docx, or 6.docx with 3.docx. And in 6.docx and 3.docx case, the result will inherit the 6.docx’s header and footer, even if i use ImportFormatMode.KEEP_DIFFERENT_STYLES when append.

awais.hafeez · July 1, 2021, 4:09pm

@zqzq34,

First off, I have simplified the code that I shared in my previous post:

String path = "C:\\Temp\\";
String pattern = "*.doc?";
String[] fileNames = GetFiles(path, pattern);

// We will append all DOCX files to this final Document
Document finalDocument = new Document();
finalDocument.removeAllChildren();

int i = 0;
for (String fileName : fileNames) {
    Document doc = new Document(path + fileName);
    // for first Section, just set SectionStart to NEW_PAGE
    doc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);

    // keep merging DOCX files with finalDocument
    finalDocument.appendDocument(doc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

finalDocument.save("C:\\Temp\\awjava-21.6.docx");

Secondly, please check these output Word documents (Output DOCX files.zip (376.6 KB)) that I produced by using the following code snippets:

Code Snippet 1:

Document doc3 = new Document("C:\\Temp\\231732\\3.docx");
Document doc4 = new Document("C:\\Temp\\231732\\4.docx");
doc4.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
doc3.appendDocument(doc4, ImportFormatMode.KEEP_SOURCE_FORMATTING);
doc3.save("C:\\Temp\\231732\\4 appended to 3.docx");

Code Snippet 2:

Document doc3 = new Document("C:\\Temp\\231732\\3.docx");
Document doc6 = new Document("C:\\Temp\\231732\\6.docx");
doc6.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
doc3.appendDocument(doc6, ImportFormatMode.KEEP_SOURCE_FORMATTING);
doc3.save("C:\\Temp\\231732\\6 appended to 3.docx");

Do you still see the same problems in above output DOCX files? If yes, then can you please elaborate with the help of comparison screenshots what are the exact issues in these Aspose.Words generated files?

zqzq34 · July 2, 2021, 2:20am

First, I used your implified code like this:

wordPathList.add("C:\\Users\\zhangqi2\\Desktop\\poi\\test\\4.docx");
wordPathList.add("C:\\Users\\zhangqi2\\Desktop\\poi\\test\\3.docx");
		
long old = System.currentTimeMillis();
		
Document finalDocument = new Document();
finalDocument.removeAllChildren();
		
for (String wordPath : wordPathList)
{
    Document doc = new Document(wordPath);
 // for first Section, just set SectionStart to NEW_PAGE
    doc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);

    // keep merging DOCX files with finalDocument
    finalDocument.appendDocument(doc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

finalDocument.save("C:\\Users\\zhangqi2\\Desktop\\poi\\4-3.docx");
long now = System.currentTimeMillis();

the result 4-3.docx can not be open by word.1.png (14.0 KB)4-3.docx (358.1 KB)

second, your code snippet 1 and code snippet 2 work. I used it to merge my 3 docxs, but the result docx has some problems.
There is a table in original docx like this:2.png (10.7 KB)
the table in result docx is like this:3.png (10.6 KB)
It seems the Chinese words in table take 2x width space than original, so they cover each other by half.

and the header img color changs from pink to blue.
4.png (22.7 KB)

awais.hafeez · July 2, 2021, 1:55pm

@zqzq34,

I am afraid, the following code produces correct output on my end: (see these input/output word documents 3 and 4 and their result.zip (703.1 KB))

ArrayList<String> wordPathList = new ArrayList<String>();

wordPathList.add("C:\\temp\\231732\\4.docx");
wordPathList.add("C:\\temp\\231732\\3.docx");

Document finalDocument = new Document();
finalDocument.removeAllChildren();

for (String wordPath : wordPathList) {
    Document doc = new Document(wordPath);
    // for first Section, just set SectionStart to NEW_PAGE
    doc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);

    // keep merging DOCX files with finalDocument
    finalDocument.appendDocument(doc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

finalDocument.save("C:\\temp\\231732\\4-3.docx");

Can you please also attach the source Word documents containing the Chinese words and output DOCX file showing the undesired behavior here for testing?

We are working on this query and will get back to you soon.

awais.hafeez · July 9, 2021, 12:22pm

@zqzq34,

You are right; the following code produces this undesired output:

Document doc3 = new Document("C:\\Temp\\231732\\3.docx");
Document doc6 = new Document("C:\\Temp\\231732\\6.docx");
doc6.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
doc3.appendDocument(doc6, ImportFormatMode.KEEP_SOURCE_FORMATTING);
doc3.save("C:\\Temp\\231732\\6 appended to 3.docx");

For the sake of any corrections in Aspose.Words’ API, we have logged this problem in our issue tracking system with ID WORDSNET-22479. We will further look into the details of this problem and will keep you updated on the status of linked issue.

aspose.notifier · August 12, 2021, 11:05am

The issues you have found earlier (filed as WORDSNET-22479) have been fixed in this Aspose.Words for .NET 21.8 update and this Aspose.Words for Java 21.8 update.