MailMerge - IMailMergeDataSource not getting all child data sources

Using Aspose Words for Java 14.3
I have been using the IMailMergeDataSource interface to merge XML data into a Word template to ultimately create a PDF. This is using the mail merge with regions. I started with the example from https://docs.aspose.com/words/java/mail-merge-with-xml-data-source/. For the most part, my implementation mirrors the example with the exception of just a couple of items. For most simple XML data, I have no problem merging the data into the template document, however, I have run into some problems when attempting to get child data elements (child data sources).
From the extensive testing I’ve done with multiple XML documents, I’ve found that XML child elements do not appear to be considered child elements for the mail merge engine if they have a text node contained within the element. Let me try to describe this. For the XML fragment below, in the middle of the XML document is para_line_data which contains repeating group text data in a para_line element. The template document is configured to have . When generating the PDF, only the first para_line is printed.

Line data
additional data

The only way I could get a child element to be considered a child element to output multiple lines is to wrap the real content in another unnecessary element. Adding the para_line_row to the XML and adding a TableStart/End for para_line_row around para_line in the template document, the data actually printed as expected.

Line data
additional data

While I can alter XML in some cases, I have some situations where I will not be allowed to alter the XML. As a result, I am looking for ways to define the IMailMergeDataSource implementation such that I can support both XML configurations (with or without the unnecessary para_line_row element). I have gotten the org.w3c.dom.Node and interrogated it upon creation of the IMailMergeDataSource impl. I found the proper count of child elements during creation, but found that subsequent records were never processed. It seems as if the IMailMergeDataSource interface should have additional methods to use as callbacks to help determine whether a Node or a NodeList is being return. Or something needs to be defined differently in the moveNext method to get the next element. Can anyone help with this situation?
I have included examples of both implementations as well as output from both to help describe the situation. The AsposeMergeTest class only needs to have configurations set for location, license file, and the input and output files. See both PDFs for the described behavior.
I have searched all over the Aspose site for additional details on how the mail merge engine works (especially with XML), but have found very very little. It’s very hard to decipher when the callbacks to the IMailMergeDataSource occur and for what condition in the XML. There are no API documents since the classes are encrypted. It’s all a black box, which doesn’t have enough documentation.

Hi James,

Thanks for your inquiry. We are working over your query and will get back to you soon.

Best regards,

Hi James,

Thanks for being patient. It is to update you that I tested the scenario and have managed to reproduce the same problem on my end. I have logged you requirement in our issue tracking system as WORDSNET-10840. Our development team will further look into the details of this problem and we will keep you updated on the status of this issue. We apologize for any inconvenience.

Best regards,

Hi James,

Regarding WORDSNET-10840, our development team has completed the work on your issue and has come to a conclusion that they won’t be able to implement the fix to your issue. Your issue (WORDSNET-10840) is now closed with ‘Won’t Fix’ resolution. Here are possible workarounds:

The xml file contains following part:

... Instructions for setup: . Put the memory inside the CPU . Connect the modem to the CPU . Connect the CPU to a power source . Turn on the CPU The items detailed above must be done in the proper order to make sure the system runs properly. If you have problems with the hardware, definitely contact us so that we can help. If you have questions about the instructions, local resources are available to help you. ...

The invoice element contains single para_line_data element. The para_line_data element contains multiple para_line element.

The sample document contains following merge schema:

TableStart:invoice

TableStart:para_line_data
para_line
TableEnd:para_line_data

TableEnd:invoice

You expect output document with following content:


Instructions for setup:
. Put the memory inside the CPU
. Connect the modem to the CPU
. Connect the CPU to a power source
. Turn on the CPU

The items detailed above must be done in the proper order to make sure the system runs properly. If you have problems
with the hardware, definitely contact us so that we can help.

If you have questions about the instructions, local resources are available to help you.

But Aspose.Words does not support repetition for merge field. For such case you should use merge region. So, first solution is modify xml file:

... Instructions for setup: . Put the memory inside the CPU .... If you have questions about the instructions, local resources are available to help you. ... If you can not modify xml file, you should implement specific behaviour with separating single para_line_data element into multiple merge records. For example, with following code snippet (.NET):
class DataSource : IMailMergeDataSource
{
    private readonly string tableName;
    private readonly XmlNodeList elements;
    private readonly bool elementAsRecord;
    private int index = -1;
    public DataSource(XmlElement element, string tableName)
    {
        elementAsRow = tableName == "para_line_data";
        elements = element.GetElementsByTagName(tableName);
        if (elementAsRow)
        {
            elements = elements[0].ChildNodes;
        }
        this.tableName = tableName;
    }
    public string TableName
    {
        get { return tableName; }
    }
    public bool MoveNext()
    {
        index++;
        return index < elements.Count;
    }
    public bool GetValue(string fieldName, out object fieldValue)
    {
        if (elementAsRow)
            fieldValue = elements[index].InnerText;
        else
            fieldValue = ((XmlElement)elements[index]).GetElementsByTagName(fieldName)[0].InnerText;
        return true;
    }
    public IMailMergeDataSource GetChildDataSource(string tableName)
    {
        return new DataSource((XmlElement)elements[index], tableName);
    }
}

James:
It seems as if the IMailMergeDataSource interface should have additional methods to use as callbacks to help determine whether a Node or a NodeList is being return. Or something needs to be defined differently in the moveNext method to get the next element.

Unfortunately, Aspose.Words cannot provide such functionality, because such functionality depends on user data structure and IMailMergeDataSource implementation. So, it is completely user’s responsibility.

Best regards,

I completely understand the fact that functionality depends on user data structure. I have exactly the same problem. I have created an engine that processes multiple XML document types each defined with unique XSDs and template documents that are completely unknown to me at development time. XSDs and templates are only presented at runtime. As a result, I cannot use the option you have given because I don’t know anything about the element data or the XSD during development. I have a runtime process to store the XSD and template before I handle any XML associated with the XSD. This allows me to validate XML based on the XSD. In other words, I don’t know that I’m getting multiple 's within a and I don’t know the template has that definition either. My template creator and XSD creator coordinate their structure without me. In fact, they have documents that are defined by outside entities which would not allow a change. This is the reason I asked for more metadata methods that would allow me to interrogate. As I mentioned, while stepping through the process, I was able to interrogate the node from within the context of the callback objects (i.e. IMailMergeDataSourceRoot, IMailMergeDataSource, and IFieldMergingCallback). I found the proper count of child elements (via DOM) which tells me the data is available and additional properties could be set within those callback objects. However, some condition in Aspose libraries is not considering this node to have more than one child element. If there was a way to set a flag within the callback objects (possibly inside IMailMergeDataSource.getChildDataSource) to process as a table (or something having multiple nodes) when there are additional child nodes, that would be helpful. Some way to prevent Aspose from treating as if there was only one element. On the surface, the addition of some helper methods, properties, or metadata methods seems to be a very minor enhancement to the Aspose library. By having the ability to force the library to process the node differently because of additional elements, I know I can solve my problem. Are you sure there is no way to add metadata functionality when we know the data is there?
XML and XSD can be defined in an infinite number of ways. Many of those are out of my control. As such, I cannot have special conditions in code because special conditions would require me to deploy new code for each unique XSD which defeats the purpose of my engine processing the XML, XSD, and template generically.
You indicate that Aspose does not support repetition for merge field, however, I originally thought I was trying to define a merge region by indicating the TableStart:para_line_data in the template followed with corresponding XML defined. I guess your documentation does not clearly define merge regions for me. I was under the impression that the TableStart would indicate the repeating group and process the XML accordingly regardless of what was contained in the child elements. In my original example, I had the text node inside the node. It seems unnecessarily complex to force an XML document to be defined to have the additional elements that make up a special Aspose structure for merge regions when the DOM Node is present for interrogation and a property could be set. As I mentioned, asking the XSD creator to change their XSD is not always an option.
If you look at the following link (http://www.w3schools.com/dom/dom_nodes.asp), the books.xml is just one example of how XML may be defined. In this case, based on your definition, I cannot have authors defined to be in a repeating group. I would only get the first instance of an author within a book. Personally, I would have defined an element that contained many elements, however, even that would not allow me to show a repeating group within Aspose, I would have to create a element within the element contained within an element. This behavior severely limits my ability to use XML in a generic way which, in my mind, is the purpose of using XSDs.
I really believe your addition of XML as a source is a great option. It helped me solve many other problems I was facing with using an attribute value list. I just hope you would continue expanding the usage of merging with XML as a source so that there are fewer limitations.

Hi James,

Thanks for the additional information. We have forwarded these details to our development team. Soon you will be updated with the required information.

Best regards,

Hi James,

Since the classic mail merge functionality cannot fit your requirements and we did not plan to support XML data source natively, we had decided to include this functionality in our upcoming new reporting engine. The upcoming reporting engine would be capable to satisfy your requirements. So, we recommend you to please wait for the new reporting engine that will be publicly released some time in future. We apologize for any inconvenience.

Best regards,