Merge field names being split on period "."

We are using the IMailMergeDataSource interface in Java. When the getValue() method is called on this interface by Aspose and the merge field name in the Word document contains a period character “.”, for example “Merge.This”, Aspose is splitting the one merge field into two merge fields and calling the getValue() method twice once for each segment, for example one call for fieldname “Merge” and another call for fieldname “This”. How do we prevent this? We need to have periods in some of our merge field names and we need the correct merge field name passed to the getValue() method invocation. We have a Word plugin which generates the merge fields via a UI in Word and some of them have periods in the name. Is there some setting in Aspose to prevent it from splitting up the merge field on periods?

Hi Raul,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.


  • Please attach your input Word document.
  • Please

    create a standalone/runnable simple Java application that demonstrates the code (Aspose.Words code) you used to produce this issue


Unfortunately,
it is difficult to say what the problem is without the Document(s) and
simplified application. We need your Document(s) and simple project to
reproduce the problem. As soon as you get these pieces of information to
us we’ll start our investigation into your issue.

Attached are a word document and simple java program that reproduces the problem.



The expected output is:

Running testMailMerge

Merge.This



The actual output is:

Running testMailMerge

This



The merge field Merge.This was split on the period and only the 2nd part was processed (my original post was incorrect about that, only the 2nd part got processed, not both).



BTW your attachment function on this forum should allow file extensions of “.java” so we can attach java source files without renaming them first.

After I uploaded the java code, I tried to click on the attachment but got a 404 error, so here is the java code just in case:

/*

  • Copyright © 2004-2010, Property of Columbia Ultimate, Inc., Vancouver, Washington. All rights reserved.
  • All software source code contained herein, and any computer executable code resulting from the use or compilation
  • thereof, are the exclusive and proprietary materials constituting the intellectual property of Columbia Ultimate,
  • Inc. Dissemination, disclosure, reproduction, copying, or any other form of communication by or to any parties
  • outside of Columbia Ultimate, Inc., is strictly forbidden without the express prior written permission of an
  • authorized Officer of Columbia Ultimate, Inc.
  • Alteration, modification, or the creation of derivative works of any type from the software source code contained
  • herein is strictly prohibited. Decompilation, disassembly, or otherwise reverse engineering any part of the Software,
  • or engagement in any other activities to obtain underlying information in or about the Software that is not visible
  • to the user in connection with normal use of the Software, is forbidden.
    */
    package test.cu.ajent.business.letters;

import com.aspose.words.Document;
import com.aspose.words.IMailMergeDataSource;
import com.aspose.words.IMailMergeDataSourceRoot;
import com.aspose.words.MailMerge;

/**

  • @author RaulC

  • @since Nov 5, 2014
    */
    public class AsposeWordsTest implements IMailMergeDataSourceRoot, IMailMergeDataSource {

    private boolean rootHasNext = true;

    private String tableName;

    public AsposeWordsTest() {}

    public AsposeWordsTest(String tableName) {
    this.tableName = tableName;
    }

    public static void main(String[] args) {
    try {
    AsposeWordsTest awt = new AsposeWordsTest();
    awt.testMailMerge();
    } catch (Exception e) {
    e.printStackTrace();
    }
    }

    public void testMailMerge() throws Exception {
    System.out.println(“Running testMailMerge”);
    Document document = new Document(“c:\tmp\test period in merge name.docx”);
    MailMerge mailMerge = document.getMailMerge();
    mailMerge.executeWithRegions((IMailMergeDataSourceRoot) this);
    mailMerge.execute(this);
    }

    /**

    • @see com.aspose.words.IMailMergeDataSourceRoot#getDataSource(java.lang.String)
      */
      public IMailMergeDataSource getDataSource(String fieldName) throws Exception {
      return getChildDataSource(fieldName);
      }

    /**

    • @see com.aspose.words.IMailMergeDataSource#getChildDataSource(java.lang.String)
      */
      public IMailMergeDataSource getChildDataSource(String fieldName) throws Exception {
      return new AsposeWordsTest(fieldName);
      }

    /**

    • @see com.aspose.words.IMailMergeDataSource#getTableName()
      */
      public String getTableName() throws Exception {
      return tableName;
      }

    /**

    • @see com.aspose.words.IMailMergeDataSource#getValue(java.lang.String, java.lang.Object[])
      */
      public boolean getValue(String fieldName, Object[] fieldValue) throws Exception {
      System.out.println(fieldName);
      fieldValue[0] = “test”;
      return true;
      }

    /**

    • @see com.aspose.words.IMailMergeDataSource#moveNext()
      */
      public boolean moveNext() throws Exception {
      // this is to simulate only one record in the data set
      boolean result = rootHasNext;
      rootHasNext = false;
      return result;
      }
      }

OK, copy paste from Eclipse didn’t work so well (why?), so attached is a Zip file with the document and java source, hopefully that will work (this should be easier).

Running your sample code above without changes works correctly and produces the following output to the console:



table name Merge

Merge.This

table name Merge

Merge.This



However, simply change your getChildDataSource() method implementation to return a child data source instead of null as follows and you can reproduce the issue with your code also:





public IMailMergeDataSource getChildDataSource(String tableName) throws Exception {

System.out.println("table name " + tableName);



CustomerList relatedCustomers = new CustomerList();

relatedCustomers.add(new Customer(“John Doe”));

relatedCustomers.add(new Customer(“Jane Doe”));

return new CustomerMailMergeDataSource(relatedCustomers);

}





Produces this output to console (note the incorrect merge field name):



table name Merge

This

table name Merge

This





Even the following simpler change to the same method will reproduce the issue (though granted you would not want to do this in a production system, would probably cause a loop, but it does also reproduce the problem):





public IMailMergeDataSource getChildDataSource(String tableName) throws Exception {

System.out.println("table name " + tableName);

return this;

}





Produces the output to console:



table name Merge

This



So the issue seems to be related to having BOTH a merge field with periods in the merge field name AND providing a non-null child data source via the IMailMergeDataSource interface. Attached is the code I ran to produce the above output.

Also why is the getChildDataSource() method being called at all? It should not be, there is no table region in the source document at all. In fact if you remove the period from the merge field name, then the getChildDataSource() method is NOT called. See attached example document in which I removed the period from the merge field name, which produces the following output (using your original sample code):





MergeThis

MergeThis





Note the absence of the “table name Merge” lines in the output, denoting that the getChildDataSource() method was not called in this case (why was it called before when the only difference is that the merge field name has a period in it?). My understanding from reading the API documentation is that the getChildDataSource() method is called only when a “TableStart:” merge field is encountered, is that not correct? There is no “TableStart:” merge field in the sample document. Why is a merge field with a period on the name being processed as a table region?

Hi Raul,

Thanks
for your inquiry.

RaulC:

Also why is the getChildDataSource() method being called at all? It should not be, there is no table region in the source document at all. In fact if you remove the period from the merge field name, then the getChildDataSource() method is NOT called. See attached example document in which I removed the period from the merge field name, which produces the following output (using your original sample code):

MergeThis MergeThis

Note the absence of the “table name Merge” lines in the output, denoting that the getChildDataSource() method was not called in this case (why was it called before when the only difference is that the merge field name has a period in it?). My understanding from reading the API documentation is that the getChildDataSource() method is called only when a “TableStart:” merge field is encountered, is that not correct? There is no “TableStart:” merge field in the sample document. Why is a merge field with a period on the name being processed as a table region?

In this case, your implementation should return null value…
RaulC:

However, simply change your getChildDataSource() method implementation to return a child data source instead of null as follows and you can reproduce the issue with your code also:

Please note that the Aspose.Words mail merge engine invokes getChildDataSource
method when it encounters a beginning of a nested mail merge region.
When the Aspose.Words mail merge engines populates a mail merge region
with data and encounters the beginning of a nested mail merge region in
the form of MERGEFIELD TableStart:TableName, it invokes getChildDataSource on the current data source object. Your
implementation needs to return a new data source object that will
provide access to the child records of the current parent record.
Aspose.Words will use the returned data source to populate the nested
mail merge region.

Below are the rules that the implementation of getChildDataSource must follow.

  • If
    the table that is represented by this data source object has a related
    child (detail) table with the specified name, then your implementation
    needs to return a new IMailMergeDataSource object that will provide
    access to the child records of the current record. An example of this is
    Orders / OrderDetails relationship. Let’s assume that the current
    IMailMergeDataSource object represents the Orders table and it has a
    current order record. Next, Aspose.Words encounters “MERGEFIELD
    TableStart:OrderDetails” in the document and invokes GetChildDataSource.
    You need to create and return a IMailMergeDataSource object that will
    allow Aspose.Words to access the OrderDetails record for the current
    order.
  • If this data source object does not
    have a relation to the table with the specified name, then you need to
    return a IMailMergeDataSource object that will provide access to all
    records of the specified table.
  • If a table with the specified name does not exist, your implementation should return null.

Can I have this issue escalated please? You are not reading my posts accurately. There is no nested mail merge region in the document (see the same attachment again, there is only a single merge field in the entire document) so the getChildDataSource method should not be called by Aspost.Words at all in the first place. It is being called (incorrectly) only when the merge field has a period in the name even though there are no nested mail merge regions in the document. If the implementation of the getChildDataSource method then happens to return a non-null value, then Aspose.Words splits the merge field name by periods and processes each part after the first period separately. If instead the implementation of getChildDataSource method happens to return null then the merge field is not split on periods. Either way there ARE NO NESTED MAIL MERGE REGIONS in the document so the getChildDataSource method should not have been called in the first place, again this is only occurring if there is a period in the merge field name.

Hi Raul,

Thanks
for your inquiry. I already noticed that there is no nested mail merge region in your document. As you are using getChildDataSource method in your code and returning not null value, so I shared the detail of this method in my previous post. Please check my first answer in my previous post.

In your case, your implementation for getChildDataSource method should return null value.

Please note that if your template has mail merge filed
e.g Item.Name, Aspose.Words can not determine at the time of field
“Item.Name” mailmerge - is this field name “Item.Name” or is this
related object “Item” with property “Name”.

Please do not use dot (.) in mail merge field in your template document (e.g Word document) if you are using custom data source, such as a list of objects. If you are implementing IMailMergeDataSource interface, dot (.) is not recommended in mail merge fields and data source object as well.

Hope this clears the detail of your queries. Please let us know if you have any more queries.

Are you saying our implementation of getChildDataSource should ALWAYS return null?? if so, then how do we support table regions if our implementation of the getChildDataSource method is supposed to return null? We do not know ahead of time if the user is going to insert table regions in their document or not.



In our actual implementation of the getChildDataSource method (not the sample code):



1. If we recognize the table name we will return the child data source (in case the user is using table regions in their document).



2. If we do not recognize the table name then we throw an exception because the user is likely to have entered in an incorrect table name in the document (similarly for standard merge fields, if we don’t recognize the field name we throw an exception).



If you are saying in the second case only we should return null, then how do we error check the document? We do not use dot notation in our field names but our field names can have a period in the name.



So to reiterate my understanding of what you are saying is that when Aspose.Words encounters a field name with a period such as this:



«Item.Name»



It is actually processed the same way as this sequence of field names (this appears to be what is happinging):



«TableStart:Item»«Name»«TableEnd:Item»



Is that correct? Is this “feature” documented in the Aspose API documentation? How do we turn this “feature” off?



BTW our product is over a decade old and we have many installed sites/clients with thousands of documents total across them. We just recently adopted Aspose.Words to support Word documents (previously we were using transforms and Apache FOP). Refactoring our merge field names to eliminate periods is not an option due to the large installed base. Everything was working great with our switch to Aspose.Words until this issue came up. We were blindsided by it because it is not documented (as far as we can see).

Hi Raul,

Thanks
for your inquiry. Let me explain the two scenarios which you shared in this forum to achieve your requirements.

1) Simple mail merge (No TableStart and TableEnd fields in the template document)
When you are using simple mail merge and your document contains mail merge field e.g Address.City. In this case, getChildDataSource is invoked and your implementation should return null value and you will get Address.City field name in getValue method.

2) Mail merge with regions
Please check following mail merge fields with regions. The Address.Street and Address.City are mail merge field names.

«TableStart:Order»
«Number»
«Address.Street»
«Address.City»

«TableStart:OrderDetail»
«Price»
«TableEnd:OrderDetail»

«TableEnd:Order»


In this scenario, your implementation of getChildDataSource should be as shown below:


public IMailMergeDataSource
getChildDataSource(String tableName)
throws Exception

{

if(tableName.equals("OrderDetail"))

{

// Your code…..

return new CustomerMailMergeDataSource(data source object);

}

else if(tableName.equals("Address"))

return null;

return null;

}


When there is dot(.) in mail merge field, the getChildDataSource method is invoked. You will get the mail merge field names as Address.Street and Address.City in getValue method.

If your template has mail merge filed
e.g Item.Name, Aspose.Words can not determine at the time of field
“Item.Name” mailmerge - is this field name “Item.Name” or is this
related object “Item” with property “Name”.


I suggest you please check the code example shared at following documentation link for your kind reference.
http://www.aspose.com/docs/display/wordsjava/How+to++Mail+Merge+from+XML+using+IMailMergeDataSource

RaulC:

So to reiterate my understanding of what you are saying is that when Aspose.Words encounters a field name with a period such as this:

«Item.Name»
It is actually processed the same way as this sequence of field names (this appears to be what is happinging):
«TableStart:Item»«Name»«TableEnd:Item»

Yes, your understanding is correct. Please check the above Order and OrderDetail example.

Hope this answers your queries. Please let us know if you have any more queries.