How to get Header, body, and footer for word document(need to get header, body, footer for each page) and save it as a text file.
required that in Java code
I used the parser.
I am getting a header at the top and a footer at the bottom but the header and footer should come for each page’s content.
But if I use the below code I am getting header, footer, body.
Document doc = new Document(path);
String text = doc.toString(SaveFormat.TEXT);footer.zip (18.5 KB)
@Vamsi_452 Documents are flow documents more like HTML, so there is no page start or end. However, you can try using Document.ExtractPages method to split your document to pages and then extract required content.
@alexey.noskov can extract the content with the below code but I am getting the header and footer first and then the body. Can I individually get each content and append that accordingly. Need in java code.
Document doc = new Document(path);
String text = doc.toString(SaveFormat.TEXT);
@Vamsi_452 Code to separately extract header, footer and body for each page will look like the following
Document doc = new Document("C:\\Temp\\in.docx");
// Extract ech page as a separate document
for (int i = 0; i < doc.getPageCount(); i++)
{
Document pageDoc = doc.extractPages(i, 1);
// Get text of the header. Note that there might be iferent types of header and footer in the document,
// for demonstration purposes extract only primary header.
String header = pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
// Get text of the body.
String body = pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
// Get Text of the footer.
String footer = pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
}
But you should note that document might contain more than one section and each section might have different types of header and footer.
How can I save it as a single page instead of multiple pages? What will be the extractPages method will do? can you share with me the code
@alexey.noskov Can you share me ExtractPages method code. I am getting error.
@Vamsi_452 doc.extractPages method is available starting from 20.11 version of Aspose.Words. So you need to update.
@alexey.noskov How can we extract headers footers correctly say my footer for page 1 is Vamsi 1 of 2 and page 2 is Vamsi 2 of 2. but during retrieval i am getting only Vamsi 2 of 2 for both pages.
Below is my code .
String text = “”;
for (int i = 0; i < doc.getPageCount(); i++)
{
Document pageDoc = doc.extractPages(i, 1);
// Get text of the header. Note that there might be iferent types of header and footer in the document,
// for demonstration purposes extract only primary header.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
}
// Get text of the body.
if(pageDoc.getFirstSection().getBody() != null) {
text += pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
}
// Get Text of the footer.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
}
}sandeep_resume.zip (15.2 KB)
@alexey.noskov headerfooter.zip (213 Bytes)
The attached documents with the above code Header_Primary is reading it as Footer value and the header value is not retrieving and footer_primary is null. Can you please look into this?
@alexey.noskov For some documents, I am not able to retrieve header values, in place of header_primary I am getting footer_primary value.
@Vamsi_452 You should simply update fields before extracting text. Please see the following code:
Document doc = new Document("C:\\Temp\\in.doc");
// Extract ech page as a separate document
for (int i = 0; i < doc.getPageCount(); i++)
{
Document pageDoc = doc.extractPages(i, 1);
pageDoc.updateFields();
String text = "";
// Get text of the header. Note that there might be different types of header and footer in the document,
// for demonstration purposes extract only primary header.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.HEADER_PRIMARY).toString(SaveFormat.TEXT);
}
// Get text of the body.
if(pageDoc.getFirstSection().getBody() != null) {
text += pageDoc.getFirstSection().getBody().toString(SaveFormat.TEXT);
}
// Get Text of the footer.
if(pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY) != null) {
text += pageDoc.getFirstSection().getHeadersFooters().get(HeaderFooterType.FOOTER_PRIMARY).toString(SaveFormat.TEXT);
}
System.out.print(text);
System.out.print("====================================================");
}