Using content within bookmarks to create new Word docs


#1

We are doing a prototype for a client and we're considering using the Aspose.Words Java API to handle one portion of the requirements. The use case follows:

They would like to iterate through the bookmarks on a Word doc and copy the contents from each bookmarked section into a separate Word doc. When finished, there would be one new Word doc containing the content from each bookmark in the original document. The intent is to have a separate approval process go on for each of the new Word docs. The approval code is part of another application that this will be integrated into.

Is this feasible? What's the general approach for doing this?

Thanks in advance.


#2

You need to be aware that a document in Aspose.Words is a tree of nodes much like an XML document.

A bookmark is represented by two nodes in a document tree that mark the beginning and end of the bookmark. These are represented by BookmarkStart and BookmarkEnd classes respectively.

There is also a "facade" class Bookmark that allows to treat a bookmark (consisting of the two start/end nodes and all the document nodes in between) as a single entity. It allows, for example, to get or set the whole text of the bookmark as a plain text string. You can imagine that for the Bookmark class to get the text of the bookmark, it needs to iterate over all document nodes between the bookmark start and end nodes and concatenate the text. For the Bookmark class to set the text of the bookmark, it needs to delete all existing document nodes between start and end nodes and create a new node with the new text.

In a Word document bookmark start and end can be in any location. For example bookmark start and end could be both in the same paragraph. A bookmark can start in one paragraph and end in another paragraph. A bookmark can even start in one section and end in another section. Depending on the bookmark span it sometimes might include quite complex objects such as tables, images and so on.

If you think about that sections, paragraph, tables, rows, cells, runs etc are all nodes in the document tree in Aspose.Words, you could appreciate that the fragment of the tree between bookmark start and end nodes could be something what I call a "jagged" tree fragment that is not very easy to work with programmatically.

To answer your question simply:

If you use Range.Bookmarks and Bookmark you can easily find all bookmarks in a document, section or any other node of the document. Having a Bookmark you can get or set the text of the bookmark as a plain text string easily. In your case, however, you probably want something more than that. I think whatever you want to do is achievable, but you will need to deal with the content of the bookmarks at the node level. That is using methods and properties of the CompositeNode and Node classes to navigate between parents, siblings, children and so on. Have a look in Aspose.Words Wiki for the diagram on what nodes can contain what nodes.

I think I've given you the general guideline. If you have a more specific question, let me know.