Mailmerge: Remove table-header if dataset is empty

M.Heinz · October 1, 2021, 9:20am

Dear sir or madam,

I’m trying to work with datasets and MailMerge in Aspose.Words.

Facts:

The datasets contain variable amounts of data-objects >= 0.
The template word document contains a table including a static header-row, followed by a row describing the dynamically created content (- using the corresponding MergeFields with TableStart/TableEnd-tags and property-tags).
If the dataset contains at least 1 object, we want to display the table-header followed by the data-rows. (Works fine.)

Issue / request / question:
If the dataset is empty though, we don’t want to display the table at all (Neither the header-row nor an empty data-row).

The latter can easily be achieved by setting the proper value for document.MailMerge.CleanupOptions prior to executing the MailMerge.
Unfortunately I could not manage to remove the header-row. Several things, I’ve already tried:

Setting different combinations of the document.MailMerge.CleanupOptions [including setting all = 63].
In the word-template-file, setting the “repeat table-header-row on each page” for the header-row of the table to either option checked/unchecked.
Combinations of the above…

Is there any way to not show the table-header-row, if the dataset is empty?

Kind regards,
Matthias Heinz

Please find attached:

Aspose_MergeFieldTest.cs: 2 test-cases performing the MailMerge on the template.docx-file; one having a non-empty dataset and one having an empty dataset.
template.docx: The docx-file, on which the MailMerge is beeing performed.
expected_result_with_data.docx, expected_result_without_data.docx: The expected output for both test-cases; “expected_result_without_data.docx” is relevant to the issue at hand.

Aspose_MergeFields_Table.zip (29.9 KB)

awais.hafeez · October 1, 2021, 3:32pm

@M.Heinz,

You can build logic on the following code to get the desired results:

private static void PerformMailMerge(string inputFileName, string outputFileName, int sizeOfDataset)
{
    // set licence
    var lic = new Aspose.Words.License();
    lic.SetLicense("Aspose.Total.lic");

    // read input-file
    Document document;
    using (var memoryStream = new MemoryStream(File.ReadAllBytes(inputFileName)))
    {
        document = new Document(memoryStream);
    }

    DocumentBuilder builder = new DocumentBuilder(document);
    builder.MoveTo(document.FirstSection.Body.Tables[0].FirstRow.FirstCell.FirstParagraph);
    BookmarkStart bookmark = builder.StartBookmark("temp");
    builder.EndBookmark("temp");

    // get sample data for mail-merge
    var dataSet = GetSampleDataSet(sizeOfDataset);

    // perform mail-merge
    document.MailMerge.CleanupOptions = (Aspose.Words.MailMerging.MailMergeCleanupOptions)63;
    document.MailMerge.ExecuteWithRegions(dataSet);

    Table table = (Table)bookmark.GetAncestor(NodeType.Table);
    if (table.Rows.Count == 1)
        table.Remove();
    else
        bookmark.Bookmark.Remove();

    // write result-file to disk
    using (var ms = new MemoryStream())
    {
        document.Clone().Save(ms, SaveFormat.Docx);
        File.WriteAllBytes(Path.Combine(outputFileName), ms.ToArray());
    }

    // Open output-file:
    //Process.Start(Path.Combine(filePath, outputFileName));
}

M.Heinz · October 4, 2021, 7:05am

Hi awais.hafeez,

thanks for your reply. Looks like a decent solution for simple cases.

But as you might have guessed, this was a very simplified example of our real word scenario. In reality we’ve got a dynamic amount of table-objects in our word template and many, but not all, of them contain different MailMerge/MergeField-placeholders for datasets of different data-types each (e.g. “MyObject1”, “MyObject2”, … using corresponding TableStart/TableEnd-tags).

As a consequence we would effectively have to (book)mark each table, that contains a TableStart-tag and do the clean-up after the MailMerge.
We would have to be careful about nested tables (e.g. if an inner table contains the TableStart-tag, we must not remove the outer tables…).
The static table-headers often times is just a single row, but it might be multiple rows as well (- making it more difficult to search for the TableStart-tag in a given table).

Therefore applying the idea of your solution might get messy very quickly, when you need to consider all of these edge-cases. I presume, there’s no easier way to tackle the issue at hand?

Kind regards,
Matthias Heinz

awais.hafeez · October 5, 2021, 6:02am

@M.Heinz,

We are checking this scenario and will get back to you soon.

M.Heinz · October 7, 2021, 6:25am

Thanks a lot for investigating this issue; I’ll be awaiting your findings.

awais.hafeez · October 7, 2021, 8:35am

@M.Heinz,

We can think of two possible solutions to this scenario:

Attachments: DOCX files and DataSet.zip (43.9 KB)

With template modification:

The template document can be modified in the way when each table with row mailmerge region is wrapped by IF field with following code: { IF { MERGEFIELD Table-<REGION_NAME>-COUNT` } <> 0 … } where REGION_NAME is the row mailmerge region name. So the nested table is only present in result if IF field condition is met, i.e. mailmerge region has records. The corresponding MERGEFIELD values can be added to the dataset in runtime:

public void TestIF()
{
    var document = new Document("template.IF.docx");

    var dataSet = GetSampleDataSet();
    var topLevelFields = EnhanceDataSet(dataSet, document);

    document.MailMerge.CleanupOptions =
        MailMergeCleanupOptions.RemoveEmptyTableRows |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveStaticFields |
        MailMergeCleanupOptions.RemoveUnusedFields |
        MailMergeCleanupOptions.RemoveEmptyParagraphs;

    document.MailMerge.ExecuteWithRegions(dataSet);
    document.MailMerge.Execute(topLevelFields.fields, topLevelFields.values);

    document.Save("out.IF.docx");
}

private static DataSet GetSampleDataSet()
{
    var dataSet = new DataSet();
    dataSet.ReadXml("ds.xml");
    return dataSet;
}

private static (string[] fields, object[] values) EnhanceDataSet(DataSet dataset, Document document)
{
    foreach (DataTable table in dataset.Tables)
    {
        var fieldName = $"Table-{table.TableName}-Count";

        foreach (DataRelation relation in table.ParentRelations)
        {
            var parentTable = relation.ParentTable;
            var column = parentTable.Columns.Add(fieldName, typeof(int));
            foreach (DataRow parentRow in parentTable.Rows)
                parentRow[column] = parentRow.GetChildRows(relation).Length;
        }
    }

    var topmostRegions = document.MailMerge.GetRegionsHierarchy().Regions.Select(p => p.Name).ToList();

    return (
        topmostRegions.Select(p => $"Table-{p}-Count").ToArray(),
        topmostRegions.Select(p => dataset.Tables[p]?.Rows.Count ?? 0).Cast<object>().ToArray()
    );
}

Code only:

NOTE: This solution requires WORDSNET-22717 to be integrated in next 21.10 version of Aspose.Words.

This solution supports tables with multiple row mailmerge regions in the same table.

public void Test()
{
    var document = new Document("template.docx");

    MarkupTables(document.MailMerge.GetRegionsHierarchy().Regions);

    var dataSet = GetSampleDataSet();

    document.MailMerge.CleanupOptions =
        MailMergeCleanupOptions.RemoveEmptyTableRows |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveStaticFields |
        MailMergeCleanupOptions.RemoveUnusedFields |
        MailMergeCleanupOptions.RemoveEmptyParagraphs |
        MailMergeCleanupOptions.RemoveUnusedRegions;

    document.MailMerge.ExecuteWithRegions(dataSet);

    CleanupTables(document);

    document.Save("out.docx");
}

private static DataSet GetSampleDataSet()
{
    var dataSet = new DataSet();
    dataSet.ReadXml("ds.xml");
    return dataSet;
}

private static void MarkupTables(IEnumerable<MailMergeRegionInfo> regions)
{
    foreach (var region in regions)
    {
        MarkupTables(region);
        MarkupTables(region.Regions);
    }
}

private static void MarkupTables(MailMergeRegionInfo region)
{
    var startCell = region.StartField.Start.ParentParagraph.ParentNode as Cell;
    if (startCell == null)
        return;

    var endCell = region.EndField.End.ParentParagraph.ParentNode as Cell;
    if (endCell == null)
        return;

    if (startCell.ParentRow != endCell.ParentRow)
        return;

    var regionRow = startCell.ParentRow;
    var headerRow = regionRow.ParentTable.FirstRow;
    if (headerRow == regionRow)
        return;

    InsertSmartTag(regionRow, TableRegionRowSmartTagProperty).Value = region.Name;
    InsertSmartTag(headerRow, TableHeaderRowSmartTagProperty).Value = region.Name;
}

private static CustomXmlProperty InsertSmartTag(Row row, string key)
{
    var tag = new SmartTag(row.Document);
    row.FirstCell.FirstParagraph.InsertAfter(tag, null);
    var property = new CustomXmlProperty(key, string.Empty, string.Empty);
    tag.Properties.Add(property);
    return property;
}

private static void CleanupTables(Document document)
{
    foreach (Table table in document.GetChildNodes(NodeType.Table, true))
    {
        var regionNames = GetRowSmartTags(table.FirstRow, TableHeaderRowSmartTagProperty).ToList();

        if (!regionNames.Any())
            continue;

        if (regionNames.Any(p => IsRegionRowSmartTagPresence(table, p)))
            continue;

        table.Remove();
    }

    document.RemoveSmartTags();
}

private static bool IsRegionRowSmartTagPresence(Table table, string regionName)
{
    foreach (Row row in table.Rows.Skip(1))
    {
        if (GetRowSmartTags(row, TableRegionRowSmartTagProperty).Any(p => p == regionName))
            return true;
    }

    return false;
}

private static IEnumerable<string> GetRowSmartTags(Row row, string key)
{
    return row.FirstCell.FirstParagraph?.GetChildNodes(NodeType.SmartTag, false)
        .Cast<SmartTag>()
        .Select(p => p.Properties[key])
        .Where(p => p != null)
        .Select(p => p.Value) ?? Enumerable.Empty<string>();
}

private const string TableRegionRowSmartTagProperty = "row-id";
private const string TableHeaderRowSmartTagProperty = "header-id";

M.Heinz · October 7, 2021, 11:15am

Hi awais.hafeez,

although I’ve managed to do so, I don’t think many of our customers would be able to place a table inside the content-area of an IF-block in Word, so unfortunately option #1 won’t be my go to solution.

The second idea, “code only”, sound much more promising. I’ll happily evaluate this option once nuget offers a download for a new version of Aspose-Words. I’d assume, that’s gonna be the next version after 21.10.0, right?
Edit: And is there already a fixed release-date for the version in question?

Thanks again for your fast replies and the enumeration of possible options.

Kind regards,
Matthias Heinz

awais.hafeez · October 7, 2021, 2:01pm

@M.Heinz,

Please check, we have now published the new 21.10 version of Aspose.Words for .NET.

M.Heinz · October 7, 2021, 3:21pm

Hi awais.hafeez,

great; I did not expect 21.10.0 to already contain this improvement. I’ll be evaluating the second option in the near future then.

Thanks a lot for the quick response!

Kind regards,
Matthias Heinz

M.Heinz · April 7, 2022, 1:20pm

@awais.hafeez,
Sorry, it took me this long, but I finally came around to test the “2. code only” solution, that you’ve provided previously.

I’ve manged to get a proof of concept working in some basic cases (based on your code) and I’ve tried to extend your solution to suit our daily requirements, but unfortunately to no avail just yet.

Let’s say you’re working with multiple datasets and the additional requirement of having to use different custom IFieldMergingCallback handlers for each dataset. Then you’re bound to call document.MailMerge.ExecuteWithRegions multiple times - in order to use the proper IFieldMergingCallback handler for each call to ExecuteWithRegions. But calling ExecuteWithRegions multiple times fails, when MailMergeCleanupOptions.RemoveUnusedRegions is set, because this would remove all remaining region markers during the first call to ExecuteWithRegions. But if I do not specify MailMergeCleanupOptions.RemoveUnusedRegions, the MailMerge does appear to perform the MailMergeCleanupOptions.RemoveEmptyTableRows cleanup, which in turn breaks the " 2. Code only" solution from earlier.

public void Test()
{
    var document = new Document("template.docx");

    MarkupTables(document.MailMerge.GetRegionsHierarchy().Regions);

    document.MailMerge.CleanupOptions =
        MailMergeCleanupOptions.RemoveEmptyTableRows |
        MailMergeCleanupOptions.RemoveContainingFields |
        MailMergeCleanupOptions.RemoveStaticFields |
        MailMergeCleanupOptions.RemoveUnusedFields |
        MailMergeCleanupOptions.RemoveUnusedRegions | // <- dangerous! Removes all unused regions after the first call to ExecuteWithRegions; but there are additional calls to ExecuteWithRegions pending...
        MailMergeCleanupOptions.RemoveEmptyParagraphs;

    // MailMerge dataset #1 using the appropriate IFieldMergingCallback object:
    var dataSet1 = GetSampleDataSet1();
    document.MailMerge.FieldMergingCallback = new FieldMergingCallbackHandler1(new DocumentBuilder(document));
    document.MailMerge.ExecuteWithRegions(dataSet1);

    // Reset the IFieldMergingCallback object to use the default handler for the next dataset:
    document.MailMerge.FieldMergingCallback = null;

    // MailMerge dataset #2 using the default handler:
    var dataSet2 = GetSampleDataSet2();
    document.MailMerge.ExecuteWithRegions(dataSet2);

    // MailMerge the dataset #3 using yet an other handler:
    var dataSet3 = GetSampleDataSet3();
    document.MailMerge.FieldMergingCallback = new FieldMergingCallbackHandler3(new DocumentBuilder(document));
    document.MailMerge.ExecuteWithRegions(dataSet3);

    //document.MailMerge.DeleteFields(); // FYI, normally we would use `document.MailMerge.DeleteFields();` to clean up remaining MergeField markers - since we can't use `MailMergeCleanupOptions.RemoveUnusedRegions` because of the previously mentioned reasons.
    CleanupTables(document);

    document.Save("out.docx");
}

// Example implementation for FieldMergingCallbackHandler1/FieldMergingCallbackHandler3
class FieldMergingCallbackHandler1 : IFieldMergingCallback
{
    private readonly DocumentBuilder documentBuilder;

    public FieldMergingCallbackHandler1 (DocumentBuilder documentBuilder)
    {
        this.documentBuilder = documentBuilder;
    }

    public void FieldMerging(FieldMergingArgs args)
    {
        documentBuilder.MoveToField(args.Field, true);

        // Modify the document here using the documentBuilder; e.g.:
        documentBuilder.Write("Hello World");

        mergeField.Remove();
    }

    public void ImageFieldMerging(ImageFieldMergingArgs args)
    {
    }
}

// Please note, this handler would normally be totally independant from FieldMergingCallbackHandler1 and this example via inheritance is just for an easier demo.
class FieldMergingCallbackHandler3 : FieldMergingCallbackHandler1, IFieldMergingCallback {}

Is there any way to benefit from both worlds: 1. Being able to Merge different datasets using their own IFieldMergingCallback handlers and 2. being able to use the “2. code only” solution to remove mergeField tables, that don’t get populated?

Kind regards,
Matthias Heinz

alexey.noskov · April 7, 2022, 5:07pm

@M.Heinz You can impalement your own method to remove unused regions and run it after executing multiple mail merge with regions operations. For example the following code removes rows with remaining regions:

private static void RemoveUnusedRegions(Document doc)
{
    foreach (MailMergeRegionInfo info in doc.MailMerge.GetRegionsHierarchy().Regions)
    {
        Row firstRow = (Row)info.StartField.Start.GetAncestor(NodeType.Row);
        Row lastRow = (Row)info.EndField.Start.GetAncestor(NodeType.Row);

        if (firstRow == lastRow && firstRow != null)
            firstRow.Remove();

        else if (firstRow != null)
        {
            while (firstRow.NextSibling != lastRow)
                firstRow.NextSibling.Remove();
            firstRow.Remove();
            lastRow.Remove();
        }
    }
}

You can run this method before removing remaining mergefields:

RemoveUnusedRegions(document);
document.MailMerge.DeleteFields();
CleanupTables(document);

In this case you do not need to use MailMergeCleanupOptions.RemoveUnusedRegions

M.Heinz · April 8, 2022, 8:27am

@awais.hafeez, @alexey.noskov,

thank you both so much! Using both of your suggestions (“2. code only” combined with the function “RemoveUnusedRegions”), my initial tests have been very successful and I’m very pleased with the results!

Thanks again very much for your time and effort; really appreciate it!

Kind regards,
Matthias Heinz