Hi Rajesh,
Thanks for sharing the detail.
*Kusumanchi.Rajesh:
In the thread, U have shared me a code to get the text of each page separately. It is giving the following exception*
Please use the attached modified method of visitParagraphStart in SectionSplitter class. This will fix this exception.
*Kusumanchi.Rajesh:
- It should avoid reading the header and footer in the document.*
Please use HeaderFooterCollection.clear method to remove all nodes from this collection and from the document.
*Kusumanchi.Rajesh:
2)It should print the exact content of each page as we see it in the word document.
4)It should be able to handle section breaks and page breaks. In case of section breaks, it should print the content of the page as visible in the word document page.*
As you are using PageSplitter and documentLayoutHelper utility, this works without any issue. If you face any issue with these utilities, please let us know.
*Kusumanchi.Rajesh:
3)It should be able to handle different formats of page numbers. For example, Page a of 1, Page 1 of 1, roman page numbers, etc.*
You are using PageSplitter utility to convert each page to separate document. In this case, there will only be one header/footer. You can get the text of header/footer by using Node.toString method.
If you still face problem, please share following detail for investigation purposes.
-
Please attach your input Word document.
-
Please create a standalone/runnable simple Java application that demonstrates the code (Aspose.Words code) you used to generate your output document
-
Please attach the output document file that shows the undesired behavior.
-
Please attach your target document showing the desired behavior. I will investigate as to how you are expecting your final document be generated like.