How to Find Page Break in MS Word document using Java

Hello,

I need to detect the page break in a *.docx document with the Aspose Words for Java 21.1 library.

The Word document is parsed using an implementation of DocumentVisitor. I’ve tried to detect the page break using the piece of code from below, but it always returns false.

@Override
public int visitParagraphStart(Paragraph paragraph) throws Exception {
    // FIXME: always returns false!
    System.out.println("start: " + paragraph.getParagraphFormat().getPageBreakBefore());
    return VisitorAction.CONTINUE;
}

Same behavior gives visitParagraphEnd method, and the paragraph’s Run instances:

for (Run run : paragraph.getRuns()) {
     if (run.getText().contains(ControlChar.PAGE_BREAK)) {
          System.out.println("Found"); // never enters here
     }
}

Here is my code: tc-aspose-evaluation.zip (63.3 KB)

How can I accomplish this?

@mihail.manoli

The ParagraphFormat.PageBreakBefore property returns True if a page break is forced before the paragraph. Please check the attached image for it. page break before.png (11.8 KB)

In your document, no paragraph is set with true value.

The ControlChar.PAGE_BREAK is related to explicit page break as shown in attached image.
page break.png (10.3 KB)

Your document does not contain the page break.

@tahir.manzoor

Thanks for the reply.

Is there any way to detect when a page ends in a Word document?

@mihail.manoli

The Aspose.Words.Layout namespace provides classes that allow to access information such as on what page and where on a page particular document elements are positioned, when the document is formatted into pages.

You can use LayoutCollector.GetStartPageIndex method to get 1-based index of the page where node begins. In your case, we suggest you please iterate over paragraph nodes of document and use LayoutCollector.GetStartPageIndex method to get the page number of paragraphs. In this way, you can find the last paragraph of page. Please also read the members of LayoutCollector class.