Mail merge with region does not work after upgrade using .NET

When upgrading our .dlls to the latest Aspose.Words (11.2) from 10.6 any regions (ex MERGEFIELD TableStart:Groups) in mail merge documents no longer merge. Merge tags outside a region still merge fine.

Is there something that needs to change in code between these two versions to merge regions. I have looked at the examples and can’t see anything obvious that I could be missing.

Hi Michael,

Thanks for your inquiry. I am afraid there weren’t any such changes to the mail merge engine in the recent versions of Aspose.Words so there is no clear reason why that is happening.

Could you please attach your input template and data here for testing?

According to the release notes there were a few changes to mail merge templates.

WORDSNET-409 Improve mail merge and bookmark performance.
WORDSNET - 5999 Add new templating engine behavior to mail merge

I will try to get together a small sample version of this tomorrow that will repeat the behavior. The data sources are part of a much larger project that can’t be sent at the moment.

Hi Mike,

Thanks for this additional information.

The first issue was actually a bug fixed quite a while back that was not closed until now. With the second issue, you are right, there was some rework of the mail merge engine so there may be a small chance that it is related to your issue.

We will wait for your input files and then test the bug further.

Thanks,

In the process of making my sample application I have found what seems to be the root cause of the issue. The order that merges are conducted matters. We have a large number of separate data sources and are calling the merge method multiple times, once with each source.

If the ExecuteWithRegions is called AFTER the Execute, no region merging occurs, but if the ExecuteWithRegions is called FIRST, then both sets of data are merged fine.

I have attached a sample solution that demonstrates this issue.

It also seems (from our main codebase issues) that if multiple ExecuteWIthRegions are called, that only the first actually works correctly.

This is happening with NON region data now as well. Only the FIRST call to Merge data is actually putting any data in the document.

We are using a custom FieldMergingCallback class in our larger codebase as well if that gives any indication on what is going on.

Hi Mike,

Please accept my apology for late response.

I have modified your code for Data Source. Please use the following code snippet for your data source. Let us know, If you have any more queries.

var table = new DataTable("Regionless Data");
table.Columns.Add("Process");
table.Columns.Add("Project");
table.Columns.Add("Form");
table.Rows.Add(new object[] {"Test Process", "Test Project", "Test Form"});
dataset.Tables.Add(table);
doc.MailMerge.ExecuteWithRegions(dataset);

I would like to make a reply to this thread that i have been following.

We recently upgraded our version of Aspose.Words to find that in some cases we also had no data displaying in the word documents generated. It would also appear to be a little random at times and involved fields and regions (where the data was definetly being passed to the word document).

However we did find a solution which worked for us. Reading the following…

https://docs.aspose.com/words/net/clean-up-before-or-during-mail-merge/

and incorporating this into our code resulted in the Word documents generating the merge fields and the regions every time without fail. Its important to note the part that you MUST place the cleanup property before the last execute to which it relates to. In this example below theres one before the last execute and one before the last executewithregions…

doc.mailmerge.Execute(dtSource1)
doc.mailmerge.Execute(dtSource2)
doc.mailmerge.CleanupOptions = Aspose.Words.Reporting.MailmergeCleanupoptions.RemoveUnusedFields
doc.mailmerge.Execute(dtSource3)
doc.mailmerge.ExecuteWithRegions(dtSource4)
doc.mailmerge.ExecuteWithRegions(dtSource5)
doc.mailmerge.cleanupoptions = Aspose.Words.Reporting.MailmergeCleanupoptions.RemoveUnusedRegions
doc.mailmerge.ExecuteWithRegions(dtSource6)

Let me know if this solves your problem so we at least know its the same issue or not.

Thanks

Tahir:

This method will not work. Adding a table that has no region in the template to a region merge doesn’t merge any of the data in that table.

We CANNOT add a region to the template to contain this “regionless data” either because of client and other product needs. It seems I will just be forced to do all the region merges first.

ncorns:

Thank you for calling out the RemoveEmpty things needing to be ONLY before the final merge. It makes sense that this is the case I guess, but seems a bit hoaky to have to spaghetti up my code because of a library issue. If only there were a merge option for NON region merging that took datasets instead of data tables, then I could just do them all in one merge call and not worry about it.

Another thing to note, we have some custom FieldMergingCallback code to work around OTHER remove empty bugs that I had to assign before the last region merge, reset to NULL before starting the regionless merges and set BACK to the custom code before the final regionless merge. A lot of crappy looking headachy code to work around a bug that I feel should not be that hard to fix.

Hi Mike,

Thanks for this additional information.

Could you please attach your full code here for testing? I will take a close look into all issues in this thread and get back to you with some direct feedback.

Thanks,

Adam:
Using the suggestions from ncorns I wrote enough code to work around the issues we are having for now. Attaching the full source will not really be possible. I can include the final method I had to put together to make things work though. Mainly the issue is that so much code had to be written in the MergeDataIntoDocument method because “RemoveEmptyRegions” etc. removes ALL of them in the document and not just the ones that were included in the tables in THAT merge call. Because of this all of the removing (and custom callback code that also removes empties) has to only be added before the LAST merge.

This could all be worked around if Execute could be called on a List or a DataSet that contains all the data tables (but not relations as these would be regionless). Even better would be if the call to ExecuteWithRegions could take one DataSet that has all the Tables for regions AND non regions. Trying to include non region data tables in the dataset right now (as suggested by tahir above) does not merge that data.

Here is the code I had to write to work around all of this for now in case anyone wants to reference it in the future:

public static void MergeDataIntoDocument(Document doc, DataSet regionData, params DataTable[] regionlessTable)
{
    // TODO: Find out why regions HAVE to be merged before regionless merges are done.
    var handler = new CustomMergeHandler();

    // Set up cleanup options for region data
    doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveEmptyParagraphs |
        MailMergeCleanupOptions.RemoveUnusedRegions |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveUnusedFields;
    doc.MailMerge.FieldMergingCallback = handler; // including custom callback

    // Merge the region based data
    doc.MailMerge.ExecuteWithRegions(regionData);

    // Reset for the next set of merges
    doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.None;
    doc.MailMerge.FieldMergingCallback = null; // including custom callback

    // Do all but the last regionlessTable merge
    for (int i = 0; i < regionlessTable.Length - 1; i++)
    {
        doc.MailMerge.Execute(regionlessTable[i]);
    }

    // Set up the cleanup options AGAIN
    doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveEmptyParagraphs |
        MailMergeCleanupOptions.RemoveUnusedRegions |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveUnusedFields;
    doc.MailMerge.FieldMergingCallback = handler; // including custom callback

    // Merge the FINAL regionless table
    doc.MailMerge.Execute(regionlessTable.Last());

    // Finally call our custom removal method as well
    // BUG workaround #3 [Remove blank lines with RemoveEmptyParagraphs- or RemoveEmptyRegions- or?](https://forum.aspose.com/t/52705)
    handler.RemoveEmptyParagraphs();
}

private class CustomMergeHandler : IFieldMergingCallback
{
    private readonly string[] htmlFields = new[]
    {
            "Instructions"
        };
    private readonly List paragraphs = new List();

    #region IFieldMergingCallback Members

    public void FieldMerging(FieldMergingArgs args)
    {
        #region BUG workarounds
        try
        {
            // BUG workaround [A bit of trouble with RemoveEmptyParagraphs and DeleteFields](https://forum.aspose.com/t/121068)
            if (null == args.FieldValue)
            {
                args.Text = "";
            }

            // BUG workaround #2 [MailMerge.setRemoveEmptyParagraphs() does function if there are more than one field per line](https://forum.aspose.com/t/100873)
            // Removed parentNodes throw exceptions during the normal remove logic
            // so this workaround can't be used
            // if (args.FieldValue == null || args.FieldValue.ToString() == "")
            // {
            // var parentNode = (CompositeNode) args.Field.End.GetAncestor(NodeType.Paragraph);
            // if (null != parentNode)
            // {
            //// If there are no more fields in the paragraph
            // if (parentNode.GetChildNodes(NodeType.FieldStart, true).Count == 1)
            // {
            // parentNode.Remove();
            // }
            // }
            // }

            // BUG workaround #3 [Remove blank lines with RemoveEmptyParagraphs- or RemoveEmptyRegions- or?](https://forum.aspose.com/t/52705)
            // Add parent paragraph into the collection.
            Paragraph parent = args.Field.Start.ParentParagraph;
            if (!paragraphs.Contains(parent))
            {
                paragraphs.Add(parent);
            }
        }
        catch
        {
            // If any of the workarounds errors, just continue the merge as normal
        }

        #endregion

        #region Handle inserting HTML in fields

        // Check if the field name is in the list of HTML capable merge fields
        if (htmlFields.Contains(args.DocumentFieldName) && null != args.FieldValue && args.FieldValue.ToString() != "")
        {
            // Insert text for this field as HTML data using DocumentBuilder
            var builder = new DocumentBuilder(args.Document);
            builder.MoveToMergeField(args.DocumentFieldName);

            // Make sure we preserve line breaks
            builder.InsertHtml(args.FieldValue.ToString().Replace("\n", "<br>"));

            // The HTML text itself should not be inserted.
            // We already inserted it as processed HTML
            args.Text = "";
        }

        #endregion
    }

    public void ImageFieldMerging(ImageFieldMergingArgs args)
    {
        // Do nothing
    }

    #endregion

    // BUG workaround #3 [Remove blank lines with RemoveEmptyParagraphs- or RemoveEmptyRegions- or?](https://forum.aspose.com/t/52705)
    public void RemoveEmptyParagraphs()
    {
        foreach (Paragraph paragraph in paragraphs)
        {
            string contents = paragraph.ToTxt().Trim();
            if (string.IsNullOrEmpty(contents) && paragraph.ParentNode != null)
            {
                paragraph.Remove();
            }
        }

        paragraphs.Clear();
    }
}

Hi Michael, Niel,

Thanks for supplying your information here and my apologies for the delay.

I have taken a close look into both of your issues. Please see my replies below.

Niel:

It was unfortunate that you were having troubles with your fields/regions being removed prematurely. The MailMergeCleanupOptions do indeed need to be used with care to make sure regions are removed before they are used. This is mentioned in the documentation, however I will look into making this clearer in the API docs as well.

Michael:

The issue you were experiencing is not related to Niel, the issue is due to how Aspose.Words renders documents. This is what happens:

  1. Sometimes when you call simple mail merge (Execute) the mail merge engine encounters a page layout related field (such as PAGE) that it needs to update. This is the case in your template.
  2. In order to update this type of fields it builds the document layout in memory and caches the layout for later if you want to render to PDF or XPS etc. This way the document layout does not need to be re-rendered.
  3. Simple mail merge works correctly. When you execute mail merge with regions the data is also replaced correctly.
  4. When you finally save to PDF, all the data is in the document, but it is unfortunately the outdated page layout from the call to Execute that is used when saving to PDF. This means the PDF output appears incorrect as the regions have not been merged. If you call ExecuteWithRegions first then it page layout is not updated first and both sets of merges appear correctly in the output.

Aspose.Words behaves this way to ensure efficiency when rendering documents. You would find if you converted the document to DOCX using the code as it is right now you would have the correct output.

The simple way to avoid this issue and make sure the document layout is up-to-date before rendering is to call Document.UpdatePageLayout before saving to PDF. I will demonstrate this in code below. I will also add a note about this behavior to the mail merge section of the documentation.

Regarding your other queries, I’m afraid we have chosen not to combine merging simple merge fields and mail merge regions into one method as they are two separate operations. Combining them may result in confusion or bugs later on, therefore we keep them separate. Also RemoveUnusedRegions will remove regions anywhere in the template even if the data source does not contain the table as it’s the only universal solution to the problem. If we removed only regions that were in the data source but empty then many customers would still have other regions left in their documents that should be removed.

The full code you have pasted is unnecessarily complex. You have work arounds for bugs that were fixed ages ago and reset the CleanupOptons many times. Please try using code like below instead. I have commented each line to show what is happening. If anything is not working as expected, please attach your input data and template here for testing and I will look into it.

public static void MergeDataIntoDocument(Document doc, DataSet regionData, params DataTable[] regionlessTable)
{
    var handler = new CustomMergeHandler();
    // Let's make sure not remove unused regions during this call to MailMerge.Execute as we will be merging the regions afterward.
    doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveEmptyParagraphs |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveUnusedFields;
    // Set the callback only to handle HTML insertion.
    doc.MailMerge.FieldMergingCallback = handler;
    // Merge all simple data.
    foreach (DataTable dataTable in regionlessTable)
        doc.MailMerge.Execute(dataTable);
    // Let's add removing unused regions now as we expect all regions to be merged with a single call to the mail merge engine.
    doc.MailMerge.CleanupOptions |= MailMergeCleanupOptions.RemoveUnusedRegions;
    // Merge the region based data
    doc.MailMerge.ExecuteWithRegions(regionData);
    // Sometimes simple mail merge can update and also cache page layout. In that situation any other document modifications don't show
    // in the output PDF. Call UpdatePageLayout before saving to PDF just incase to avoid this happening.
    doc.UpdatePageLayout();
    doc.Save("Document Out.pdf");
}
private class CustomMergeHandler : IFieldMergingCallback
{
    private readonly string[] htmlFields = new[]
    {
            "Instructions"
        };
    public void FieldMerging(FieldMergingArgs args)
    {
        #region Handle inserting HTML in fields
        // Check if the field name is in the list of HTML capable merge fields
        if (htmlFields.Contains(args.DocumentFieldName) && null != args.FieldValue && args.FieldValue.ToString() != "")
        {
            // Insert text for this field as HTML data using DocumentBuilder
            var builder = new DocumentBuilder(args.Document);
            builder.MoveToMergeField(args.DocumentFieldName);
            // Make sure we preserve line breaks
            builder.InsertHtml(args.FieldValue.ToString().Replace("\n", "<br>"));
            // The HTML text itself should not be inserted.
            // We already inserted it as processed HTM.
            args.Text = "";
        }
        #endregion
    }
    public void ImageFieldMerging(ImageFieldMergingArgs args)
    { }
}

Thanks,