I would like to know if Aspose.Words can do the following search for us.
I would like Aspose.Words to retrieve document content placed between [foreach] and [end-foreach]. Please note that the content between [foreach] and [end-foreach] can be plain text, tables or paragraphs.
Can Aspose.Words do the above? if yes, some examples/sample code would be helpful.
Thank you for considering Aspose.Words. Yes, this functionality is surely possible using our component. However, the implementation depends on the following: whether the [foreach] and [end-foreach] tags are immediate children of section’s story (i.e. not placed within tables, in different sections etc) or their location is totally arbitrary?
Also, could you please attach a sample document to test?
The location of [foreach] and [end-foreach] tags is arbitrary. They can appear anywhere and everywhere.
I have attached what a sample document template would look like. The tags to be replaced are contained within chevrons (<>). The foreach tags are named and there contents are suppose to be repeated for a specified number of times.
As seen in the attached file the foreach tag can appear inside a table to create multiple rows. Thank you for your prompt reply.
The fact the location of the tags is arbitrary definitely complicates the task. The point is that the Aspose.Words document model has a tree-like (or XML-like) structure where paragraphs and tables are immediate children of section story, tables are parents for rows, rows are parents for cells, and so forth. As you understand, applying a flat range with arbitrary start and end points to such a tree requires some special operations and handling particular scenarios of tags placement, especially if you need to extract the contents. For example, how should the extracted contents look like if the start tag is located in a nested table’s cell while the end tag is in another section’s header?
However, it is still possible. Just to gain some ideas, please read the following thread first:
This is an article written by our team leader Roman Korchagin. It contains a code sample that shows how to extract a bookmark content. With some modifications it may be applicable to your case also. All you need to change is find start and end paragraphs whereas in the sample those are simply the parent paragraphs of the bookmark’s start and end nodes. To find text runs containing your tags and thus their parent paragraphs, consider the find and replace functionality:
Node the code sample still does not cover all possible start and end positions, just to simplify the task. As mentioned in the article, in the future Aspose.Words will provide some neat high-level methods encapsulating all this complex functionality, but at the moment you should implement it on your own.
Nevertheless, I’ve noticed your document much resembles a template to be populated from a database. You did not specify what the purpose of text extraction is… not to merely repeat it? If so, we could forget about content extraction and implement Aspose.Words mail merge functionality which requires just few lines of code. The point is that it supports mail merge with regions where a region is marked by the TableStart:TableName and TableEnd:TableName merge fields that is very similar to the foreach tags in your document. Even if you are unable to redesign the document, we could replace your tags with merge fields on the fly… Please let me know if my assumptions are correct.
Thanks for the information.
You are right in your assumptions, my document is actually a Word 2003 template that is populated from a database. The purpose of text extraction was to repeat it for each row in a table, replacing fields such as <> with the data in the table. Unfortunately, we can not seem to use Mail Merge functionality because Mail Merge regions can not be nested inside each other and our template can have nested foreach tags. For example:
If you think the above can be achieved by replacing our tags with ‘merge fields on the fly’ and nesting will not be an issue then please let me know. I will try and test it with Mail Merge regions any way.
I understand the point you are making about the document being represented in a XML tree-like format. The article by Roman Korchagin is useful and I can see that Aspose.Words should be able to provide a solution but it might require a lot of coding and there are no high level functions as yet.
I will test Aspose.Words functionality using the template I attached previously and let you know how I get on.
Ok, keep us posted about your progress.