XML input with multiple nodes

Hi,

I am working on a project which requires XML data to be merged into a Word template in order to create a customer letter as output.

My XML document structure is…


























Using C#, I am reading the XML document into a DataSet. Then using the Aspose.Word evaluation package I am merging the data from the DataSet into a Word template.

I have 2 issues you may be able to help me with…
- The first is, using MergeField in the Word doc I cannot access without defining a TableStart and TableEnd for . This means I have setup as repeating information when I don’t want it to be. Is there a way of defing a data path similar to XPath?
- The second is, within I do have repeating information which needs to be displayed within the doc. The trick is this repeating information is displayed in a table with no repeating information from other nodes…

e.g.
TableStart





TableEnd

TableStart





TableEnd

Therefore I cannot put a TableStart and TableEnd around the entire block of repeating information as and belong to a different node.

I hope this makes sense.

Thanks,
Rich.

Hi Rich,

Mail merge features in Aspose.Word are not yet mature enough to handle complex scenarios such as this easily, although this is one of the mail goals for us to provide a powerful data reporting engine.

It is difficult for me to work with abstract data so I can provide some better ideas if you email a sample of your data and a document that you want to build to word@aspose.com.

The general recommendation will be to use mail merge to build portions that you can, but handle the rest using merge field events and/or DocumentBuilder. It is not as straightforward as designing a report and specifying where the data comes from, but it can be done.

Hi Roman,

Thanks for your reply.

I am about to send you through my C# code in a .NET solution (CallNotice.sln), a word template (test.doc), and an XML document (test.xml).

Initially, I would like to know if it is possible to get rid of the TableStart and TableEnds for non repeating information.

Thanks,
Rich.

Thanks for the document and other bits.

This seems to be a reoccuring question. In your scenario you have 3 data tables with only one row and 1 table with several rows.

1. You want to be able to insert data from several tables with only one row into the document and fields from one table can mix with fields from other tables.

Aspose.Word was not initially designed with this scenario in mind. The original thinking was that the client needs to combine the data from all tables into one using a query.

As it turns out it is not very practical as raised by several customers. It would be much nicer to use merge field names such as TableName.FieldName and let Aspose.Word worry about selecting data from several tables. We will implement better support for this scenario in future versions.

But in the meantime there is still an elegant solution available. Use TableName.FieldName syntax for your merge fields and then use DocumentBuilder to populate fields. Here is an example:

DataSet dataSet = new DataSet();
dataSet.ReadXml(@“test.xml”);

Document doc = TestUtil.Open(@“test.doc”);

//The simplest way to handle tables with one row in your situation is to navigate
//the document to builder to every merge field and insert data there.
string[] fieldNames = doc.MailMerge.GetFieldNamesEx();
DocumentBuilder builder = new DocumentBuilder(doc);

for (int i = 0; i < fieldNames.Length; i++)
{
string fullFieldName = fieldNames[i];

string[] parts = fullFieldName.Split(’.’);
//This will skip fields that are not like TableName.FieldName
if (parts.Length != 2)
continue;

string tableName = parts[0];
string fieldName = parts[1];

DataTable table = dataSet.Tables[tableName];
DataRow row = table.Rows[0];

builder.MoveToMergeField(fullFieldName);
builder.Write(row[fieldName].ToString());
}


2. You have one table that contains multiple rows and represents child records of the main record contained in the other 3 tables. Just use mail merge regions in the standard way (no change from your example). Note that you have the region with the same name 4 times in your document so you need to call MailMerge.Execute 4 times too.

//Use a mail merge region to handle the CallItem table.
doc.MailMerge.ExecuteWithRegions(dataSet.Tables[“CallItem”]);

//Do this several more times for every CallItem merge region you have in the document.
doc.MailMerge.ExecuteWithRegions(dataSet.Tables[“CallItem”]);
doc.MailMerge.ExecuteWithRegions(dataSet.Tables[“CallItem”]);
doc.MailMerge.ExecuteWithRegions(dataSet.Tables[“CallItem”]);

doc.Save(@“test Out.doc”);

Hi Roman,

Thanks for your help regarding the use of Document Builder for displaying non repeating information within the template. This was exactly what we were looking for.

We are currently evaluating packages, and are very interested in Aspose as a solution due to the functionality it offers.

We are now looking at the performance of the package. I have put the code into a loop which opens the tempate, populates the merge fields, saves the output to disc, closes the template, and then starts the process again.

To improve performance I have tried to leave the template open in memory and produce multiple output documents by feeding more than one dataset into the template. Aspose did not seem to like having multiple datasets merged to one instance of the template. Is this possible?

Are there any other enhancements you know of, which we could implement to improve performance.

Thanks for your help.

Regards,
Rich.

Hi Rich,

You are right, merging into a Document removes the merge fields and replaces them with data and you cannot merge into the document once the merge fields are gone. This is by design.

We looked at an alternative to copy the document in memory before merging into it, but for one mail merge operation it is basically doubling memory usage so we decided to replace the fields without copying the document.

But you raise a valid point. For creating multiple documents from one template it will probably be better in terms of performance to just clone the document in memory without reading it from file multiple times.

Actually, most of our objects internally support cloning, so if we just expose Document.Clone that makes a deep copy of a document it should help. You will need to open a document only once and clone it into new Document before merge. It should be faster and less memory usage than opening the document again and again.

We have done performance and memory optimizations in several areas of Aspose.Word, but I don’t think we have explicitly optimized performance of the mail merge operation itself yet. This is also a good task for us.

We will try to implement Document.Clone and mail merge performance optimization if needed by the end of June.

Let me know what sort of performance you get from Aspose.Word now, how big are the files etc? Also what do you think is a satisfactory result for your tests.

Hi Roman,

Thanks for the information on Document.Clone feature for future release, this sounds like an possible answer to our issue.

The complexity of our template documents and the number of records to be output will vary significantly as we hope to implement the solution in a number of areas within our organisation.

The template could vary from a simple 1 page generic letter with variable address information, right through to a complex invoice with multiple instances of repeating information.

As a point of reference, the template I sent through as an example a week or 2 ago should be more towards the complex end of the documents we will be producing.

The record numbers to be through put will vary from anywhere between 150 and 1500 at a time.

We are looking at performance issues from a worst case scenario, where we might need 1500 records merged into a complex multi page document, and also looking to the future when record numbers will increase, and templates get more complicated.

I have also sent through an email to word@aspose.com, covering information I did not want to disclose on public forum.

Thanks,
Rich.

Hi Rich,

Please download Aspose.Word 1.5.4.

There is a significant performance improvement for your scenario. It is a combination of changes to how we populate your documents and changes to Aspose.Word code.

P4, 2.4Ghz, 512Mb RAM machine.

The document is 5 pages long, 56Kb disk file, several mail merge regions and 40-50 standalone fields.

It now takes only 19 seconds to generate 1000 documents.

Here is a summary of what has changed in the way the document is populated:

1. Open the file only once and use Document.Clone before each merge.

2. Replaced all merge fields related to the CommunicationMethod table with only 4 bookmarks and using DocumentBuilder.MoveToBookmark to insert address blocks. This is faster because bookmarks are inherently lighter and faster than merge fields (bookmarks don’t have field code, separator, field value, do not require parsing etc).

3. Programmatically combined fields of Call and Investor tables into one InvestorCall table and fill them in just one mail merge operation. This reduced number of “dummy” mail merge regions we had to have.

Hi Roman,

The performance improvements in the new release are absolutely staggering!

We are all thoroughly impressed here.

We implemented the new document.clone feature and straight away throughput was 10 times faster (and quicker than the competition you will be glad to know).

We are now in a position to demonstrate document production to the business this Thursday, will let you know the outcome.

Fantastic work you guys!

Regards,
Rich.

Dear Rich,

Firstly I wish your demostration successful.

But you’re strongly encouraged to let us know more about your performance comparision with our competitors. I believe the source code that is used to produce the same documents would help much or whatever you would like to provide.

Hi Roman and Ben,

The end to end test demo has run like a dream, the real demo is this afternoon.

We are confident now, Aspose.Words performance exceeds it competitor package we were trailing, and it’s functionality is far better.

However, the end to end test has highlighted a couple of issues you may be able to help us with…

Firstly, using the document.clone feature recently released we have discovered that the first output document produced does not contain any merged data output. Is this a defect or are we doing something wrong?

Secondly, we need to compress the address lines up to remove any blank fields from the address block. We discovered the Document.MailMerge.RemoveEmptyParagraphs feature, but we cannot implement this as we are using Document Builder to merge the data to the template. Are there any similar features we can use for Document Builder?

Thanks alot for you help.

Regards,
Rich.

I will have a look at the missing data issue (I had all documents filled properly), but I hope it’s not necessary to release a fix before today’s demo.

Once you insert data using DocumentBuilder, I suggest you just build a whole string using System.Text.StringBuilder from your address fields and then invoke DocumentBuilder.Writeln only once to insert that string. This way you can build the address block exactly like you want it, adding commas if needed or whatever. If you want the address block to wrap into multiple lines just use \r character.

Hi Roman,

No need for a quick fix release today, we can work around the issue for the demo to give you time to look into it.

Thanks for the info on DocumentBuilder, I will implement the changes.

Thanks,
Rich.

Hi Roman,

My Apologies Document.Clone is working fine. We discovered a bug in my code which was causing the error. I hope this didn’t cause any inconvenience.

Rich.