Free Support Forum -

Extract Text Beetween Bookmark

We use Aspose.word to build policy documents dynamically for our users. During the document generation process, we insert bookmarks "<>>" tags around editable sections. We want our users to enter their text and once they upload a document, we would like to extract text between these bookmarks. Is there any easy way to do this?

I have attached the sample document here.


If your text was inside a single bookmark, you would be able to do just Bookmark.Text to get the text inside it. But retrieving text between two bookmarks is a bit more involved at the moment.

Basically I see two ways: implement a DocumentVisitor or traverse the tree starting from the first bookmark till you reach the last bookmark. In both approaches you need to collect text that you encounter between the bookmarks and it will be the text you are looking for.

1. If implementing a visitor, just create a class derived from DocumentVisitor and override VisitBookmarkStart and VisitRun methods. Then use Document.Accept(myVisitor) and it will start the enumeration over all document nodes. The pseudocode for the algorithm will be something like this:

initial state = not collecting text

got BookmarkStart, if bookmark name equals name of the first bookmark, state = collecting text

got BookmarkStart, if bookmark name equals name of the last bookmark, state = not collecting text, stop document visitor.

got Run, if collecting text, then add Run.Text to the collected text.

2. Alternatively, you can use the parent/child/sibling properties of the nodes to traverse from the first bookmark to the last bookmark. This is probably a bit more complex than letting DocumentVisitor enumerate the document for you.