I am evaluating your product while doing some research for a project my company may be doing. We are going to need to independently determine the headers, footers, and the body content for a large batch of word documents, and extract the content as plain text with no formatting. I have found how to extract the entire content from the word document, by using doc.Range, but I cannot see how to extract just the headers (of different types) and the content seperately. Is there a way to do this in Aspose.Word?
Sure. Create a class that implements the IDocumentVisitor interface. Use its methods to handle start and end of the document stories and to extract the stories text. Pass this class to the Document.Accept method.
Here’s an example of how to extract primary header, primary footer and the content separately.
public void ExtractDocumentStories()
public class DocumentStoriesExtractingVisitor : IDocumentVisitor