Compare Docs - weird Revisions

Hi,

we are getting weird Revisions after using doc.Comare method. Please find details below.

Aspose Words version: 22.3.0

Original part of the paragraph:

Updated part of the paragraph:

As you can see 2017 changed to 2020.

In total we are getting 3 revisions:

Revision Details:

1:

2:

3:

just to summarize:

revision #1: deletion of 201
revision #2: deletion of 7
revision #3: insertion of 2020

what we expect:

revision #1: deletion of 2017
revision #2: insertion of 2020

Our goal: understand what parts of the text have been removed, added or updated. In case of update we want to know what value was replaced by what to be able to do kind of post analysis. Our further business logic depends on this information.

@dkorolev Revisions in MS Word documents are applied to nodes. Most likely in your document the removed text is represented by two Run nodes. That is why you are getting 2 delete revisions. To consider these revisions as a single in MS Word UI there is RevisionGroup. So in your case you should check whether revisions belong to the same RevisionGroup and if so consider this group as a single revision.

hi @alexey.noskov , thanks for the answer. The problem with using groups is that there are no identifiers or other information besides the text to verify that the Run really belongs to the group. Simple comparison of text snippets doesn’t look super reliable. So, maybe you can give me some recommendations/code examples on how to check whether revisions belong to the same RevisionGroup. Thanks.

@dkorolev You can use code like the following to process groups of revisions:

Document doc = new Document(@"C:\Temp\in.docx");

Dictionary<RevisionGroup, List<Revision>> groups = new Dictionary<RevisionGroup, List<Revision>>();
List<Revision> individualRevisions = new List<Revision>();

foreach (Revision rev in doc.Revisions)
{
    if (rev.Group == null)
    {
        individualRevisions.Add(rev);
        continue;
    }

    if (!groups.ContainsKey(rev.Group))
    {
        groups.Add(rev.Group, new List<Revision>());
    }

    groups[rev.Group].Add(rev);
}

Console.WriteLine("There are {0} revision groups in the document.", groups.Count);
Console.WriteLine("There are {0} individual revisions in the document.", individualRevisions.Count);
foreach (RevisionGroup g in groups.Keys)
{
    Console.WriteLine("==============================================");
    Console.WriteLine("Revision group type : {0}", g.RevisionType);
    Console.WriteLine("Revision group author : {0}", g.Author);
    Console.WriteLine("Revision group modified text : \"{0}\"", g.Text);

    // Print an individual revision information within a group.
    foreach (Revision r in groups[g])
        PrintRevisionInfo(r);
}

// Print an individual revisions
if (individualRevisions.Count > 0)
{
    Console.WriteLine("Individual revisions.");
    foreach (Revision r in individualRevisions)
        PrintRevisionInfo(r);
}
private static void PrintRevisionInfo(Revision r)
{
    Console.WriteLine("\t--------------------------------------");
    Console.WriteLine("\tDate: {0}", r.DateTime);
    Console.WriteLine("\tAuthor: {0}", r.Author);
    Console.WriteLine("\tRevisionType: {0}", r.RevisionType);
    Console.WriteLine("\tRevision is applied to text: \"{0}\"", r.ParentNode.ToString(SaveFormat.Text));
}