Aspose word extract content for each page header,body,footer

Vamsi_452 · November 17, 2020, 11:44am

How to get Header, body, and footer for word document(need to get header, body, footer for each page) and save it as a text file.

@alexey.noskov

Vamsi_452 · November 17, 2020, 12:03pm

required that in Java code

Vamsi_452 · November 17, 2020, 12:37pm

I used the parser.

I am getting a header at the top and a footer at the bottom but the header and footer should come for each page’s content.

But if I use the below code I am getting header, footer, body.

Document doc = new Document(path);
String text = doc.toString(SaveFormat.TEXT);footer.zip (18.5 KB)

alexey.noskov · November 17, 2020, 12:53pm

@Vamsi_452 Documents are flow documents more like HTML, so there is no page start or end. However, you can try using Document.ExtractPages method to split your document to pages and then extract required content.

Vamsi_452 · November 17, 2020, 12:56pm

@alexey.noskov can extract the content with the below code but I am getting the header and footer first and then the body. Can I individually get each content and append that accordingly. Need in java code.

Document doc = new Document(path);
String text = doc.toString(SaveFormat.TEXT);

alexey.noskov · November 17, 2020, 1:13pm

@Vamsi_452 Code to separately extract header, footer and body for each page will look like the following

    Document doc = new Document("C:\\Temp\\in.docx");

    // Extract ech page as a separate document
    for (int i = 0; i < doc.getPageCount(); i++)
    {
        Document pageDoc = doc.extractPages(i, 1);

        // Get text of the header. Note that there might be iferent types of header and footer in the document,
        // for demonstration purposes extract only primary header.
        String header = pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
        // Get text of the body.
        String body = pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
        // Get Text of the footer.
        String footer = pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
    }

But you should note that document might contain more than one section and each section might have different types of header and footer.

Vamsi_452 · November 17, 2020, 2:10pm

How can I save it as a single page instead of multiple pages? What will be the extractPages method will do? can you share with me the code

@alexey.noskov Can you share me ExtractPages method code. I am getting error.

alexey.noskov · November 17, 2020, 2:48pm

@Vamsi_452 doc.extractPages method is available starting from 20.11 version of Aspose.Words. So you need to update.

Vamsi_452 · November 26, 2020, 12:11pm

@alexey.noskov How can we extract headers footers correctly say my footer for page 1 is Vamsi 1 of 2 and page 2 is Vamsi 2 of 2. but during retrieval i am getting only Vamsi 2 of 2 for both pages.

Below is my code .

String text = “”;
for (int i = 0; i < doc.getPageCount(); i++)
{
Document pageDoc = doc.extractPages(i, 1);
// Get text of the header. Note that there might be iferent types of header and footer in the document,
// for demonstration purposes extract only primary header.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
}
// Get text of the body.
if(pageDoc.getFirstSection().getBody() != null) {
text += pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
}
// Get Text of the footer.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
}
}sandeep_resume.zip (15.2 KB)

Vamsi_452 · November 26, 2020, 1:32pm

@alexey.noskov headerfooter.zip (213 Bytes)

The attached documents with the above code Header_Primary is reading it as Footer value and the header value is not retrieving and footer_primary is null. Can you please look into this?

Vamsi_452 · November 26, 2020, 1:46pm

@alexey.noskov For some documents, I am not able to retrieve header values, in place of header_primary I am getting footer_primary value.

alexey.noskov · November 27, 2020, 8:43am

@Vamsi_452 You should simply update fields before extracting text. Please see the following code:

    Document doc = new Document("C:\\Temp\\in.doc");

    // Extract ech page as a separate document
    for (int i = 0; i < doc.getPageCount(); i++)
    {
        Document pageDoc = doc.extractPages(i, 1);

        pageDoc.updateFields();

        String text = "";
        // Get text of the header. Note that there might be different types of header and footer in the document,
        // for demonstration purposes extract only primary header.
        if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY) != null) {
            text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
        }
        // Get text of the body.
        if(pageDoc.getFirstSection().getBody() != null) {
            text += pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
        }
        // Get Text of the footer.
        if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY) != null) {
            text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
        }

        System.out.print(text);
        System.out.print("====================================================");
    }