The following document test1.docx (11.6 KB)
contains two tables. However, when diffed against itself, the process merges the two tables into one table in the result. This is run on Aspose 22.4
This is causing downstream issues for us. We expect the diffing behavior to preserve the two tables. Please let us know why this occurs.
Tracked internally by us as 11227.
@epaulet-society This is an expected behavior. In MS Word tables must be separated by a paragraph. If there is no paragraph between tables, they are merged into one table. MS Word behaves the same.
@alexey.noskov The two-table structure in the document attached is legal XML and MS Word doesn’t auto-merge the tables after opening the document either. Can you explain what you mean by ‘MS Word behaves the same.’?
We created this document in XML so we expect the structure to be preserved upon diffing, especially since MS Word itself preserves the structure.
@epaulet-society Yes, XML is valid but but in Ms Word documents tables must be separated by a paragraph and the last node of the body must be a paragraph. This is required to make is possible for editor to move cursor at any position. If there is no paragraph between tables you cannot insert content between these tables in MS Word so literally these tables are considered a a single table.
When you open your document in MS Word it considers the tables as a single table:
Click on the cross at top left corner and whole table will be selected. Also after saving the document in MS Word in the XML the tables also will be represented as a a single table.
Here is input:
<w:body>
<w:tbl>
..........
</w:tbl>
<w:tbl>
..........
</w:tbl>
<w:p/>
..........
</w:body>
and here is output (after saving document using MS Word): ms.docx (12.6 KB)
<w:body>
<w:tbl>
..........
</w:tbl>
<w:p w14:paraId="78CF3DB3" w14:textId="77777777" w:rsidR="00232FE6" w:rsidRDefault="00232FE6"/>
..........
</w:body>
Hi @alexey.noskov, I see what you are saying now and I am able to replicate this behavior in MS Word. Does the Aspose diff API support any type of normalization option like this performed beforehand on two documents we wish to compare? Or is saving and loading the documents the best route?
@epaulet-society Aspose.Words normalizes tables on both on load and on save. So once you load your document in Aspose.Words.Document, the tables in your document will be concatenated in one table and will be represented as a single table:
Document doc = new Document(@"C:\Temp\in.docx");
Console.WriteLine(doc.GetChildNodes(NodeType.Table, true).Count); // Returns 1
1 Like