Extracting page number from footers

omerab82 · December 3, 2020, 9:22pm

Hi
I’m trying to write an app in Java that’ll return the page number of a paragraph.
The number of the page is stored in the footers of the document.
Is there any way to extract the page number from the footer of the page in which the paragraph located?
thanx!

awais.hafeez · December 4, 2020, 10:36am

@omerab82,

Suppose a Word document has five pages and a single Section with only Primary Footer containing a Paragraph that you want to rederive Page number of. In this case, the content (Paragraph) will simply be repeated on every page. What page number value do you expect in this case? Can you please ZIP and upload a sample Word document and screenshot highlighting the Paragraph that you want to get page number of here for testing. We will then investigate the scenario further on our end and provide you more information.

omerab82 · December 4, 2020, 1:59pm

test1.docx.zip (66.9 KB)

the documents are in hebrew, and they contain messages, each message has a title and content. My goal is to index these messages by title and page number. As you can see in the document attached the page number appear on the left bottom side of the page. Screen Shot 2020-12-04 at 15.54.13.png (306.2 KB)

I want to extract the page number for each title. I don’t care what is the actual page number, I just need the number from the footer.

awais.hafeez · December 6, 2020, 6:19am

@omerab82,

You can build logic on the following Java code to get the desired output:

Document doc = new Document("C:\\Temp\\test1\\test1.docx");

// Find the paragraph that you want to extract page number of
String paragraph_Text = "מינוי ממלא מקום המנהל הכללי";
Paragraph targetPara = null;
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.toString(SaveFormat.TEXT).trim().contains(paragraph_Text)) {
        targetPara = para;
        break;
    }
}

if (targetPara != null) {
    LayoutCollector layoutCollector = new LayoutCollector(doc);
    int pageNumber = layoutCollector.getStartPageIndex(targetPara) - 1; // zero-based page number

    Document pageDoc = doc.extractPages(pageNumber, 1);
    String pageNumberFromFooter = "";

    if (pageDoc.getFirstSection().getPageSetup().getDifferentFirstPageHeaderFooter()) {
        HeaderFooter footer = pageDoc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_FIRST);
        for (Shape shape : (Iterable<Shape>) footer.getChildNodes(NodeType.SHAPE, true)) {
            if (shape.toString(SaveFormat.TEXT).trim().contains(ControlChar.TAB)) {
                String[] parts = shape.toString(SaveFormat.TEXT).trim().split(ControlChar.TAB);
                pageNumberFromFooter = (parts[0].length() < parts[1].length()) ? parts[0] : parts[1];
                break;
            }
        }
    } else if (pageDoc.getFirstSection().getPageSetup().getOddAndEvenPagesHeaderFooter()) {
        HeaderFooter footer = pageDoc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_EVEN);
        for (Shape shape : (Iterable<Shape>) footer.getChildNodes(NodeType.SHAPE, true)) {
            if (shape.toString(SaveFormat.TEXT).trim().contains(ControlChar.TAB)) {
                String[] parts = shape.toString(SaveFormat.TEXT).trim().split(ControlChar.TAB);
                pageNumberFromFooter = (parts[0].length() < parts[1].length()) ? parts[0] : parts[1];
                break;
            }
        }
    } else {
        HeaderFooter footer = pageDoc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY);
        for (Shape shape : (Iterable<Shape>) footer.getChildNodes(NodeType.SHAPE, true)) {
            if (shape.toString(SaveFormat.TEXT).trim().contains(ControlChar.TAB)) {
                String[] parts = shape.toString(SaveFormat.TEXT).trim().split(ControlChar.TAB);
                pageNumberFromFooter = (parts[0].length() < parts[1].length()) ? parts[0] : parts[1];
                break;
            }
        }
    }

    System.out.println(pageNumberFromFooter);
}