Mail Merge Engine Generates Different output When Root tag of XML has attributes using Java | DataSet.ReadXml

p.binnig · April 1, 2021, 10:13am

Hello,
I have the following behaviour and can’t explain it (maybe I am missing something).
Consider the two data sources
1:xml

<?xml version="1.0" encoding="UTF-8"?>
<root a="a">
  <bar>
    <Name>Text1</Name>
  </bar>
  <foo>
    <bar>
      <Name>Text2</Name>
    </bar>
  </foo>
</root>

2:xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <bar>
    <Name>Text1</Name>
  </bar>
  <foo>
    <bar>
      <Name>Text2</Name>
    </bar>
  </foo>
</root>

and two template files
1:docx

«TableStart:bar»
«Name»
«TableEnd:bar»

2:docx

«TableStart:root»
«TableStart:bar»
«Name»
«TableEnd:bar»
«TableEnd:root»

If I run those through a verry simple program

DataSet data = new DataSet();
data.readXml(data_path);
Document doc = new Document(new FileInputStream(template_path));
MailMerge mailMerge = doc.getMailMerge();
mailMerge.executeWithRegions(data);
doc.save(output_path);

I get for 1:xml and 1:docx both words ‘Test1’ and ‘Test2’
for 1:xml and 2:docx only the expected ‘Text1’
for 2:xml and 1:docx no replacements
and for 2:xml and 2:docx again both words ‘Test1’ and ‘Test2’

I don’t understand why the example with 2:xml and 1:docx fails maybe I am missing something.
I am using Aspose.Words 20.11 and I have added a zip file containing all files.
Best
Peter Binnigtemp.zip (10.5 KB)

tahir.manzoor · April 1, 2021, 4:59pm

@p.binnig

We have tested the scenario using the latest version of Aspose.Words for Java 21.3 with following code example. We have not found any issue with output document. Please check the attached documents. output.zip (34.8 KB)

create_doc(dataDir + "test_data1.xml", dataDir + "temp1.docx", dataDir + "result1.docx");
create_doc(dataDir + "test_data1.xml", dataDir + "temp2.docx", dataDir + "result2.docx");
create_doc(dataDir + "test_data2.xml", dataDir + "temp1.docx", dataDir + "result3.docx");
create_doc(dataDir + "test_data2.xml", dataDir + "temp2.docx", dataDir + "result4.docx");

Please check the data visualizer output of 1.xml 1.xml.png (6.8 KB)

Please check the data visualizer output of 2.xml 2.xml.png (5.2 KB)

In 1.xml, the bar.Name is under foo and root. In 2.xml, the bar.Name is under foo only. So, the output is according to template documents. Hope this answers your query.

p.binnig · April 6, 2021, 6:08am

Yes, I understand that the Table ‘bar’ cannot be found under ‘root’ in 2.xml.
What I don’t understand is why. The only difference is an additional attribute in ‘root’.
So as far as my thinking goes, your result4.docx should be the same as the result2.docx.
Where am I wrong here?

tahir.manzoor · April 6, 2021, 4:50pm

@p.binnig

Please note that com.aspose.words.net.System.Data.DataSet.readXml method mimics the .NET DataSet.ReadXml method. This method reads the XML and convert it into DataSet that contain DataTables.

You can check the output of your xml files in the attached images.

p.binnig · April 7, 2021, 8:51am

That you for your help so far. But I think I still don’t understand this completely.
I added to my small example program.

import com.aspose.words.*;
import com.aspose.words.net.System.Data.*;

import java.io.FileInputStream;

public class test {

    private static void print_values(DataSet dataSet, String label) {
        System.out.println("\n" + label);
        for (DataTable table : dataSet.getTables()) {
            System.out.print(table.getTableName() + ":\nChildTables: ");
            for (DataRelation dr : table.getChildRelations()) {
                System.out.print(dr.getChildTable().getTableName() + " ");
            }
            System.out.print("\nColumns: ");
            for (DataColumn column : table.getColumns()) {
                System.out.print(column.getColumnName() + " ");
            }
            int row_num = 0;
            for (DataRow row : table.getRows()) {
                System.out.print("\nRow " + row_num + ": ");
                for (DataColumn column : table.getColumns()) {
                    System.out.print(row.get(column) + " ");
                }
            }
            System.out.println("\n");
        }
    }

    static void create_doc(String data_path, String path_schema, String template_path, String output_path) {
        try {
            DataSet data = new DataSet();
            data.readXmlSchema(path_schema);
            data.readXml(data_path, XmlReadMode.READ_SCHEMA);
            print_values(data, data_path);
            Document doc = new Document(new FileInputStream(template_path));
            MailMerge mailMerge = doc.getMailMerge();
            mailMerge.executeWithRegions(data);
            doc.save(output_path);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {

        String dataDir = "C:\\Temp\\";
        create_doc(dataDir + "test_data2.xml", dataDir + "schema.xsd", dataDir + "temp2.docx", dataDir + "result4.pdf");

    }
}

and created a small schema for my xml.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
	<xs:element name="root">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="bar"/>
				<xs:element ref="foo"/>
			</xs:sequence>
			<xs:attribute name="a" use="optional" type="xs:string"/>
		</xs:complexType>
	</xs:element>
	<xs:element name="foo">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="bar"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
	<xs:element name="bar">
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="Name"/>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
	<xs:element name="Name" type="xs:string"/>
</xs:schema>

this gives me this as output:

C:\Temp\test_data2.xml
root:
ChildTables: bar foo 
Columns: a root_Id 

foo:
ChildTables: bar 
Columns: foo_Id root_Id 
Row 0: 0  

bar:
ChildTables: 
Columns: Name root_Id foo_Id 
Row 0: Text1   
Row 0: Text2  0

as you can see the root table has two child tables bar and foo and bar has two rows ‘Text1’ and ‘Text2’
but I still get no replacements in my document.

«TableStart:root»
«TableStart:bar»
«Name»
«TableEnd:bar»
«TableEnd:root»

Why is ‘Text1’ not in root? (In fact it seems to only be in bar)
If I run this with an attribute in root ‘Text1’ is in root.

tahir.manzoor · April 7, 2021, 4:49pm

@p.binnig

In this XML, there are three tables.

root with column ‘a’.
bar with column ‘Name’
foo with no column.

If you add attribute e.g. col1 as shown below.

<bar col1 = "Text0">
    <Name>Text1</Name>
</bar>

The bar table has two columns col1 and Name.

The root table has no column in the XML. The Text1 is in bar table.

The Text1 is not under root table. The root table has only column that is attribute.

p.binnig · April 8, 2021, 6:48am

Absolutely correct. I did not explain myself correctly. I did not mean ‘Text1’ is under ‘root’ I meant since ‘bar’ is a child table of ‘root’ and ‘Text1’ is in ‘bar’ the mergefield ‘Name’ should be replace with ‘Text1’ in

«TableStart:root»
«TableStart:bar»
«Name»
«TableEnd:bar»
«TableEnd:root»

I understand by now that if I have (Other than child elements) empty root element and I read the xml with option AUTO (the default) I don’t get the root element as table.
But if I have the schema from above and this xml

<?xml version="1.0" encoding="UTF-8"?>
<root a="a">
  <bar>
    <Name>Text1</Name>
  </bar>
  <foo>
    <bar>
      <Name>Text2</Name>
    </bar>
  </foo>
</root>

I get ‘bar’ and ‘foo’ as child table of ‘root’ and ‘bar’ as child table of ‘foo’.

root:
ChildTables: bar foo 
Columns: a root_Id 
Row 0: a 0 

foo:
ChildTables: bar 
Columns: foo_Id root_Id 
Row 0: 0 0 

bar:
ChildTables: 
Columns: Name root_Id foo_Id 
Row 0: Text1 0  
Row 0: Text2  0

If I run

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <bar>
    <Name>Text1</Name>
  </bar>
  <foo>
    <bar>
      <Name>Text2</Name>
    </bar>
  </foo>
</root>

with the schema I get nearly the same but not quite

root:
ChildTables: bar foo 
Columns: a root_Id 

foo:
ChildTables: bar 
Columns: foo_Id root_Id 
Row 0: 0  

bar:
ChildTables: 
Columns: Name root_Id foo_Id 
Row 0: Text1   
Row 0: Text2  0

and I will get no replacements for my mergefields.
Shouldn’t those produce exactly the DataSet (without the row for teh value of a) and have the same behaviour when merging with a document like

«TableStart:root»
«TableStart:bar»
«Name»
«TableEnd:bar»
«TableEnd:root»

?
Why am I getting

Text1

with the first xml and not replacement with the second?

tahir.manzoor · April 8, 2021, 7:11pm

@p.binnig

You are facing the expected behavior of Aspose.Words.

Please execute the following code snippet.

DataSet data = new DataSet();
//data.readXmlSchema(path_schema);
//data.readXml(data_path, XmlReadMode.READ_SCHEMA);
data.readXml(data_path); // the xml is without root attribute.
print_values(data, data_path);

The output will be :

bar:
ChildTables: 
Columns: Name foo_Id 
Row 0: Text1  
Row 0: Text2 0 

foo:
ChildTables: bar 
Columns: foo_Id 
Row 0: 0

There is no root table in the DataSet. Moreover, the bar table has following two rows. The foo_Id has no value for Text1.

Row 0: Text1  
Row 0: Text2 0

The answer of this query is no. As shared in my earlier post, com.aspose.words.net.System.Data.DataSet.readXml method mimics the .NET DataSet.ReadXml method. Please share the screenshot of output of DataSet.ReadXml (.NET method).

image.png (6.1 KB)

There is no root table in the DataSet for this case.