Compare document summary

Hi Team,
I am comparing 2 word document using aspose c#.
I want to summary of changes which is found in comparision.
Is there any way to show summary of changes at top of document.

sourceContent & destContent has bytes of my word file.

byte[] compareContent = DocComparer.DocumentCompare.CompareDocument(sourceContent, destContent, "docx", true);

@Jayshiv Aspose.Words comparison works the same way as MS Word’s comparison, i.e. detected differences are shown in the output document as revisions. Please see our documentation for more information:
https://docs.aspose.com/words/net/compare-documents/

You can get revisions after comparing documents to get comparison summary. See Document.Revisions property.

@alexey.noskov
I compared 2 documents and display the revisions summary in one table.
I want to redirect on specific changes or I can get page no in revisions object(I can’t find any properties for that).

Here is my code snippest

static void Main(string[] args)
{
    Aspose.Words.License licWord = new Aspose.Words.License();
    string strLicenscePath = "D:\\Word DLL and Lic\\Aspose.Words_New.lic";
    licWord.SetLicense(strLicenscePath);

    Document doc1 = new Document("C:\\Users\\ABC\\Desktop\\file1.docx");
    Document doc2 = new Document("C:\\Users\\ABC\\Desktop\\file2.docx");

    doc1.Compare(doc2, "author", DateTime.Now);
    DocumentBuilder builder = new DocumentBuilder(doc1);
    Table table = new Table(doc1);

    doc1.FirstSection.Body.AppendChild(table);
    Section firstSection = doc1.FirstSection;
    Body body = firstSection.Body;
    body.InsertBefore(table, body.FirstChild);

    Row row = new Row(doc1);
    row.RowFormat.AllowBreakAcrossPages = true;
    table.AppendChild(row);
    table.AutoFit(AutoFitBehavior.AutoFitToWindow);

    Cell cell = new Cell(doc1);
    cell.CellFormat.HorizontalMerge = CellMerge.First;
    cell.AppendChild(new Paragraph(doc1));
    cell.FirstParagraph.AppendChild(new Run(doc1, "Summary of Changes"));
    row.AppendChild(cell);

    row.AppendChild(cell.Clone(false));
    row.LastCell.AppendChild(new Paragraph(doc1));
    row.LastCell.FirstParagraph.AppendChild(new Run(doc1, ""));

    row.AppendChild(cell.Clone(false));
    row.LastCell.AppendChild(new Paragraph(doc1));
    row.LastCell.FirstParagraph.AppendChild(new Run(doc1, ""));
    if (table.Rows.Count > 0)
    {
        Row firstRow = table.FirstRow;
        firstRow.Cells[0].CellFormat.HorizontalMerge = CellMerge.First;
        for (int i = 1; i < firstRow.Cells.Count; i++)
        {
            firstRow.Cells[i].CellFormat.HorizontalMerge = CellMerge.Previous;
        }
        firstRow.Cells[0].Paragraphs[0].ParagraphFormat.Alignment = ParagraphAlignment.Center;
        firstRow.Cells[0].Paragraphs[0].Runs[0].Font.Bold = true;
    }

    Row rows = new Row(doc1);
    rows.RowFormat.AllowBreakAcrossPages = true;
    table.AppendChild(rows);
    table.AutoFit(AutoFitBehavior.AutoFitToWindow);

    Cell cells = new Cell(doc1);
    cells.AppendChild(new Paragraph(doc1));

    cells.FirstParagraph.AppendChild(new Run(doc1, "Type"));
    rows.AppendChild(cells);

    rows.AppendChild(cells.Clone(false));
    rows.LastCell.AppendChild(new Paragraph(doc1));
    rows.LastCell.FirstParagraph.AppendChild(new Run(doc1, "Author"));

    rows.AppendChild(cells.Clone(false));
    rows.LastCell.AppendChild(new Paragraph(doc1));
    rows.LastCell.FirstParagraph.AppendChild(new Run(doc1, "Text"));

    Row secondRow = doc1.FirstSection.Body.Tables[0].Rows[1];
    foreach (Cell c in secondRow.Cells)
    {
        foreach (Run run in c.Paragraphs[0].Runs)
        {
            run.Font.Bold = true;
        }
    }

    RevisionCollection revisions = doc1.Revisions;
    foreach (Revision revision in revisions)
    {
        Row rowData = new Row(doc1);
        rowData.RowFormat.AllowBreakAcrossPages = true;
        table.AppendChild(rowData);
        table.AutoFit(AutoFitBehavior.AutoFitToWindow);

        Cell celldata = new Cell(doc1);
        celldata.AppendChild(new Paragraph(doc1));
        celldata.FirstParagraph.AppendChild(new Run(doc1, revision.RevisionType.ToString()));
        rowData.AppendChild(celldata);

        rowData.AppendChild(celldata.Clone(false));
        rowData.LastCell.AppendChild(new Paragraph(doc1));
        rowData.LastCell.FirstParagraph.AppendChild(new Run(doc1, revision.Author));

        rowData.AppendChild(celldata.Clone(false));
        rowData.LastCell.AppendChild(new Paragraph(doc1));
        rowData.LastCell.FirstParagraph.AppendChild(new Run(doc1, revision.ParentNode.GetText().Trim()));
    }
    AddLineBreakAfterTable(doc1, table);
    doc1.Save("C:\\Users\\ABC\\Desktop\\ComparedDoc.docx");
    Console.ReadLine();
}

ComparedDoc.docx (166.5 KB)
file1.docx (336.4 KB)
file2.docx (336.4 KB)

@Jayshiv You can use LayoutCollector to get page number where a particular node is located. For example see the following code:

Document doc1 = new Document(@"C:\Temp\file1.docx");
Document doc2 = new Document(@"C:\Temp\file2.docx");

doc1.Compare(doc2, "test", DateTime.Now);

// Create LayoutCollector and get page numbers where the revisions are located.
LayoutCollector collector = new LayoutCollector(doc1);
foreach (Revision r in doc1.Revisions)
{
    int page = collector.GetStartPageIndex(r.ParentNode);
    Console.WriteLine($"Page: {page}; Type: {r.RevisionType}; Author: {r.Author}; Text: '{r.ParentNode.ToString(SaveFormat.Text).Trim()}'");
}
1 Like

Thanks for your response .
How can we navigate to that page when click on page number in revison summary table.

@Jayshiv You should use PAGEREF field to achieve this. For example see the following code:

DataTable revisionStats = new DataTable();
revisionStats.Columns.Add("Type");
revisionStats.Columns.Add("Author");
revisionStats.Columns.Add("Text");
revisionStats.Columns.Add("Page");

Document doc1 = new Document(@"C:\Temp\file1.docx");
Document doc2 = new Document(@"C:\Temp\file2.docx");

doc1.Compare(doc2, "test", DateTime.Now);
DocumentBuilder builder = new DocumentBuilder(doc1);

int revisionIndex = 0;
foreach (Revision r in doc1.Revisions)
{
    Node revisionParent = r.ParentNode;
    Node firstInline = revisionParent;
    while (firstInline.IsComposite && firstInline.NodeType != NodeType.Paragraph)
        firstInline = firstInline.NextPreOrder(revisionParent);

    // Wrap inline node into bookmark;
    string bkName = $"_revision{revisionIndex++}";
    firstInline.ParentNode.InsertBefore(new BookmarkStart(doc1, bkName), firstInline);
    firstInline.ParentNode.InsertAfter(new BookmarkEnd(doc1, bkName), firstInline);

    revisionStats.Rows.Add(r.RevisionType, r.Author, r.ParentNode.ToString(SaveFormat.Text).Trim(), bkName);
}

// builder stats table at the beginning of the document.
builder.StartTable();
foreach (DataColumn col in revisionStats.Columns)
{
    builder.InsertCell();
    builder.Font.Bold = true;
    builder.ParagraphFormat.Alignment = ParagraphAlignment.Center;
    builder.Write(col.ColumnName);
}
builder.EndRow();
builder.Font.ClearFormatting();
builder.ParagraphFormat.ClearFormatting();
foreach (DataRow row in revisionStats.Rows)
{
    foreach (DataColumn col in revisionStats.Columns)
    {
        builder.InsertCell();
        string cellVal = row[col].ToString();
        if (col.ColumnName == "Page")
        {
            builder.InsertField($"PAGEREF {cellVal} \\h");
        }
        else
        {
            builder.Write(cellVal);
        }
    }
    builder.EndRow();
}
builder.EndTable();

// update fields to update the inserted PAGEREF fields.
doc1.UpdateFields();
doc1.Save(@"C:\Temp\out.docx");
1 Like

Thanks @alexey.noskov for your response.
I’m checking this solution in different documents and I added some null check accordingly and I observed that while loop is going in infinite.

foreach (Revision r in doc1.Revisions)
{
    if (r.Group != null)
    {
        revisionIndex++;
        page = collector.GetStartPageIndex(r.ParentNode);
        Node revisionParent = r.ParentNode;
        Node firstInline = revisionParent;

        while (firstInline != null && firstInline.IsComposite)
            firstInline = revisionParent.NextPreOrder(revisionParent);

        if (firstInline != null)
        {
            // Wrap inline node into bookmark;
            //string bkName = $"_revision{revisionIndex++}";
            string bkName = r.ParentNode.ToString(Aspose.Words.SaveFormat.Text).Trim().Replace(" ", "_");
            firstInline.ParentNode.InsertBefore(new BookmarkStart(doc1, bkName), firstInline);
            firstInline.ParentNode.InsertAfter(new BookmarkEnd(doc1, bkName), firstInline);

            revisionStats.Rows.Add(r.RevisionType, bkName);
        }
    }
}

Here is my documents.
2.docx (142.0 KB)

1.docx (140.7 KB)

@Jayshiv Yes, there is a mistake in the while loop. I have corrected the code in my previous answer. The loop should be modified like this:

while (firstInline.IsComposite && firstInline.NodeType != NodeType.Paragraph)
    firstInline = firstInline.NextPreOrder(revisionParent);

Hi @alexey.noskov ,
Thanks for your quick response .
But there is one scenario.
When I delete the paragraph or page data(multiple paragraphs) then I got multiple revision for 1 paragraph. In that case when I insert bookmark it assign to at the starting of document and navigate there insted of that deleted paragraph because of that while loop we set condition for NodeType.Paragraph.

1.docx (140.7 KB)

2 1.docx (142.4 KB)

@Jayshiv When several nodes are affected by one action, each of the changed nodes has a revisions. MS Word usually groups such revisions. You can modify your code to process revision groups. Please see the following modified code:

DataTable revisionStats = new DataTable();
revisionStats.Columns.Add("Type");
revisionStats.Columns.Add("Author");
revisionStats.Columns.Add("Text");
revisionStats.Columns.Add("Page");

Document doc1 = new Document(@"C:\Temp\1.docx");
Document doc2 = new Document(@"C:\Temp\2.docx");

doc1.Compare(doc2, "test", DateTime.Now);
DocumentBuilder builder = new DocumentBuilder(doc1);

int revisionIndex = 0;
List<RevisionGroup> processedGroups = new List<RevisionGroup>();
foreach (Revision r in doc1.Revisions.ToList())
{
    // The revision group is already processed.
    if (r.Group != null && processedGroups.Contains(r.Group))
        continue;

    string bkName = $"_revision{revisionIndex++}";
    string revText = "";
    Node revisionParent = r.ParentNode;
    // Individual revision.
    if (r.Group == null)
    {
        revText = r.ParentNode.ToString(SaveFormat.Text).Trim();
    }
    // Process as group
    else if (!processedGroups.Contains(r.Group))
    {
        List<Revision> groupRevisions = doc1.Revisions.Where(rev => rev.Group == r.Group).ToList();
        // Group might contain multiple revions, so there is no a single node.
        // Insert bookmark at the end of revised content.
        revisionParent = groupRevisions.Last().ParentNode;
        revText = (r.RevisionType != RevisionType.FormatChange)
            ? r.Group.Text
            : string.Join("", groupRevisions.Select(rev => rev.ParentNode.ToString(SaveFormat.Text).Trim()));

        processedGroups.Add(r.Group);
    }
    // Get node to wrap with bookmark.
    Node firstInline = revisionParent;
    while (firstInline.IsComposite && firstInline.NodeType != NodeType.Paragraph)
        firstInline = firstInline.NextPreOrder(revisionParent);

    // Wrap node into bookmark;
    firstInline.ParentNode.InsertBefore(new BookmarkStart(doc1, bkName), firstInline);
    firstInline.ParentNode.InsertAfter(new BookmarkEnd(doc1, bkName), firstInline);

    revisionStats.Rows.Add(r.RevisionType, r.Author, revText, bkName);
}

// builder stats table at the beginning of the document.
builder.StartTable();
foreach (DataColumn col in revisionStats.Columns)
{
    builder.InsertCell();
    builder.Font.Bold = true;
    builder.ParagraphFormat.Alignment = ParagraphAlignment.Center;
    builder.Write(col.ColumnName);
}
builder.EndRow();
builder.Font.ClearFormatting();
builder.ParagraphFormat.ClearFormatting();
foreach (DataRow row in revisionStats.Rows)
{
    foreach (DataColumn col in revisionStats.Columns)
    {
        builder.InsertCell();
        string cellVal = row[col].ToString();
        if (col.ColumnName == "Page")
        {
            builder.InsertField($"PAGEREF {cellVal} \\h");
        }
        else
        {
            builder.Write(cellVal);
        }
    }
    builder.EndRow();
}
builder.EndTable();

// update fields to update the inserted PAGEREF fields.
doc1.UpdateFields();
doc1.Save(@"C:\Temp\out2.docx");

Hi @alexey.noskov ,
Thank you so much for quick answers.
But there are still issue in the paragraph and also facing issue in table values where multiple values are inserted, but only add the first value of table in the summary of changes table and navigate that value due to processedGroups.Contains(r.Group). In the paragraph case it is still navigating to first para of the document. I modified in the code and here it is:

Document doc1 = new Document("C:\\Users\\ABC\\Desktop\\1.docx");
Document doc2 = new Document("C:\\Users\\ABC\\Desktop\\2.docx");

doc1.Compare(doc2, "author", DateTime.Now);

DataTable revisionStats = new DataTable();
revisionStats.Columns.Add("Type");
revisionStats.Columns.Add("Text");
revisionStats.Columns.Add("Page");

DocumentBuilder builder = new DocumentBuilder(doc1);
LayoutCollector collector = new LayoutCollector(doc1);

int revisionIndex = 0;
int page = 1;
List<RevisionGroup> processedGroups = new List<RevisionGroup>();
foreach (Revision r in doc1.Revisions.ToList())
{
    // The revision group is already processed.
    if (r.Group != null && processedGroups.Contains(r.Group))
        continue;

    //string bkName = $"_revision{revisionIndex++}";
    var bkName = r.ParentNode.ToString(SaveFormat.Text).Trim().Replace(" ", "_").Contains("\r") ? r.Group.Text : r.ParentNode.ToString(SaveFormat.Text).Trim().Replace(" ", "_");
    if (bkName == "")
    {
        bkName = "Blank";
    }
    string revText = "";
    Node revisionParent = r.ParentNode;
    // Individual revision.
    if (r.Group == null)
    {
        revText = r.ParentNode.ToString(SaveFormat.Text).Trim();
    }
    // Process as group
    else if (!processedGroups.Contains(r.Group))
    {
        List<Revision> groupRevisions = doc1.Revisions.Where(rev => rev.Group == r.Group).ToList();
        // Group might contain multiple revions, so there is no a single node.
        // Insert bookmark at the end of revised content.
        revisionParent = groupRevisions.Last().ParentNode;
        revText = (r.RevisionType != RevisionType.FormatChange)
            ? r.Group.Text
            : string.Join("", groupRevisions.Select(rev => rev.ParentNode.ToString(SaveFormat.Text).Trim()));

        processedGroups.Add(r.Group);
    }
    // Get node to wrap with bookmark.
    Node firstInline = revisionParent;
    while (firstInline.IsComposite && firstInline.NodeType != NodeType.Paragraph)
        firstInline = firstInline.NextPreOrder(revisionParent);

    // Wrap node into bookmark;
    firstInline.ParentNode.InsertBefore(new BookmarkStart(doc1, bkName), firstInline);
    firstInline.ParentNode.InsertAfter(new BookmarkEnd(doc1, bkName), firstInline);

    revisionStats.Rows.Add(r.RevisionType, revText, bkName);
}
builder.StartTable();

foreach (DataColumn col in revisionStats.Columns)
{
    builder.InsertCell();
    builder.Font.Bold = true;
    builder.ParagraphFormat.Alignment = ParagraphAlignment.Center;
    builder.Write(col.ColumnName);
}
builder.EndRow();
builder.Font.ClearFormatting();
builder.ParagraphFormat.ClearFormatting();
foreach (DataRow row in revisionStats.Rows)
{
    foreach (DataColumn col in revisionStats.Columns)
    {
        builder.InsertCell();
        string cellVal = row[col].ToString();
        if (col.ColumnName == "Page")
        {
            builder.Font.Underline = Aspose.Words.Underline.Single;
            builder.Font.Color = Color.Blue;
            builder.InsertField($"PAGEREF {cellVal} \\h", page.ToString());
        }
        else
        {
            builder.Font.Underline = Aspose.Words.Underline.None;
            builder.Font.Color = Color.Black;
            builder.Write(cellVal);
        }
    }
    builder.EndRow();
}
builder.EndTable();
string text = "Version 1 edited by ABC on Dec 5, 2023 4:09 PM" + ControlChar.ParagraphBreak + "Version 2 edited by XYZ on Dec 7, 2023 3:32 PM" + ControlChar.ParagraphBreak + ControlChar.ParagraphBreak + "Summary of Changes:";
Paragraph paragraph = new Paragraph(doc1);
paragraph.AppendChild(new Run(doc1, text));

// Add the paragraph to the document
doc1.FirstSection.Body.InsertBefore(paragraph, doc1.FirstSection.Body.FirstChild);
Paragraph targetParagraph = (Paragraph)doc1.FirstSection.Body.FirstChild;
foreach (Run run in targetParagraph.Runs)
{
    run.Font.Size = 10;  // Set the font size to 10 points
    run.Font.Color = Color.Orange; // Set the font color to blue
}

builder.InsertBreak(BreakType.PageBreak);
// update fields to update the inserted PAGEREF fields.
doc1.UpdateFields();
doc1.Save("C:\\Users\\ABC\\Desktop\\ComparedDoc.docx");

Here is the ComparedDoc :
ComparedDoc.docx (139.6 KB)

Source files :
1.docx (140.7 KB)
2 1.docx (142.4 KB)

@RajChauhan @Jayshiv The first problem with moving to the beginning of the document is not reproducible on my side. Here is the output produced on my side:
out.docx (175.8 KB)

Regarding text in the table. The revision group might contain a lot of revision. To get full text of revision you should get text of each individual revision. Try changing the following code:

revText = (r.RevisionType != RevisionType.FormatChange)
     ? r.Group.Text
     : string.Join("", groupRevisions.Select(rev => rev.ParentNode.ToString(SaveFormat.Text).Trim()));

to

revText = string.Join("", groupRevisions.Select(rev => rev.ParentNode.ToString(SaveFormat.Text).Trim()));

Hi @alexey.noskov ,
Thanks for your quick response.
But I’m still facing that issue with my code, I gave you my code in the last reply in which I have inserted the paragraph before the summary table.

Can you provide something so that I’ll navigate to that page instead of that text (for paragraphs and tables) for the multiple group revisions and bookmark issue?

If there are multiple revisions in single page, then also I want to navigate to that page instead of the revision group which contains multiple revisions.

I only want the Revision Type and Page number in the summary table. Page number for navigation to that page.

@RajChauhan You should note that MS Word documents are flow by their nature and there is no “page” concept. The consumer application reflows the document content into pages on the fly. Also the only way to navigate within the document is bookmark, i.e. it is required to insert a bookmark at the position where it is required to navigate and then use either HYPERLINK or PAGEREF field to navigate to the bookmark.

As I have mentioned I have used the same code as yours on my side and the problem is not reproducible. See the output document attached in my previous post. As I can see you are using an old 21.8 version of Aspose.Words. I have used the latest 23.12 version for testing. So please try using the latest version and let us know if the problem still persist.

If with the latest version the problem is still there, please create a simple console application (including all required resources) that will allow us to replicate the behavior on our side.

Hi @alexey.noskov ,
Just wanted to say a thank you for the solutions you provided. Your help is greatly appreciated!
Yes, I’m facing that issue due to license which I used.
Thank you for quick response.

1 Like