Read all the track changes(Insertion, Deletion) values with Before and AfterText from word documents in a paragraph-C#

Hi Team,

Our requirement in Word Document is to track and Extract Before Text and After Text from Track Changes in a paragraph.

Example :

  1. The proverb has deep meaning, which is always useful important for a successful life. It conveys the idea that we should always think and then act accordingly act with 55%. Impulsive actions may lead us to embarrassing and odd situations. As we should always think before they respond, the way we speak, in the same way we should think before we act.

The above paragraph is a sample one where we need to track changes(Before Text and After Text) as follows,
Case1: First sentence which is always useful is deleted and important is inserted.
Output should be:
Before Text: The proverb has deep meaning, which is always useful for a successful life.
After Text: The proverb has deep meaning, important for a successful life.
We should track each sentence wise from full stop to full stop.

Case2: Second sentence think and then act accordingly deleted and act with 55% is inserted.
Output should be:
Before Text: It conveys the idea that we should always think and then act accordingly.
After Text: It conveys the idea that we should always act with 55%.
We should track each sentence wise from full stop to full stop.

Case2: Third sentence *we speak, in the same way deleted and they respond, the way is inserted.
Output should be:
Before Text: As we should always think before we speak, in the same way we should think before we act.
After Text: As we should always think before they respond, the way we should think before we act.
We should track each sentence wise from full stop to full stop.
Also Please observe that for the third case, I have added the text(Insertion) from left hand side instead of right hand side(Case 1 and 2). Will that make any difference here to get the before and aftertext?

Along with Before and After Text we need to read the total paragraph for all the 3 cases.
Please provide us a solution to achieve this

Thanks in advance.

@sureshkap Let’s clarify some points.
What is the source data? The document that includes all 6 revisions, two revisions (deleted/added) for each of the three cases you described?
What should the code provide on output? Each of the three sentences in two versions (before/after)? Paragraph text before and after?
TestDoc.docx (13.1 KB)

The document that includes all 6 revisions, two revisions (deleted/added) for each of the three cases you described? - Yes all 6 revisions as described in the document.

Output should be like this for all three sentences:
Output1:
Change Type: Insertion
BeforeText: The proverb has deep meaning, which is always useful for a successful life.
After Text: The proverb has deep meaning, important for a successful life.
Paragraph: The proverb has deep meaning, which is always useful important for a successful life. It conveys the idea that we should always think and then act accordinglyact with 55%. Impulsive actions may lead us to embarrassing and odd situations. As we should always think before they respond, the waywe speak, in the same way we should think before we act.

Output2:
Change Type: Deletion
BeforeText: The proverb has deep meaning, which is always useful for a successful life.
After Text: The proverb has deep meaning, important for a successful life.
Paragraph: The proverb has deep meaning, which is always useful important for a successful life. It conveys the idea that we should always think and then act accordinglyact with 55%. Impulsive actions may lead us to embarrassing and odd situations. As we should always think before they respond, the waywe speak, in the same way we should think before we act.

Paragraph contains both old and new changed data.
Or else is there a way we can show only newly added data or before modification data for the paragraph?

Output shall contain 3 insertion and 3 deletions as mentioned above.
What is the source data?- I didnt get the question. If you are asking for source data, it is word document.

@sureshkap Sorry, I still don’t fully understand the assigned task. You write different types of changes to Output1 and Output2, but the contents of the output is identical.
Each revision is a change in the document at a certain point, each revision can be accepted or rejected. Please consider the following code:

Document doc = new Document("TestDoc.docx");

Document docBefore = doc.Clone(true) as Document;
docBefore.Revisions.RejectAll();
Console.WriteLine("Paragraph befor all changes");
Console.WriteLine(docBefore.FirstSection.Body.FirstParagraph.GetText().Trim());
Console.WriteLine();

for (int i = 0; i < doc.Revisions.Count; i++)
{
    Document docAfter = doc.Clone(true) as Document;
    for (int j = 0; j <= i; j++)
        docAfter.Revisions[0].Accept();

    docAfter.Revisions.RejectAll();
    Console.WriteLine("Next up is revision {0} {1}", i + 1, doc.Revisions[i].RevisionType.ToString());
    Console.WriteLine(docAfter.FirstSection.Body.FirstParagraph.GetText().Trim());
    Console.WriteLine();
}

docBefore = doc.Clone(true) as Document;
docBefore.Revisions.AcceptAll();
Console.WriteLine("Paragraph after all changes");
Console.WriteLine(docBefore.FirstSection.Body.FirstParagraph.GetText().Trim());
Console.WriteLine();

The code shows the paragraph text of the document attached above in case of applying each revision alternately, i.e. it shows how the paragraph text changed sequentially.

What i want is irrespective of the duplicate values, i will ask you in simple way. Paragraph part I am clear.
Ex:
The world is now affected with Corona and people are suffering severely.

Above sentence changed like this - The world is now affected with Corona and COVID where crores of people are suffering severely.

Output would be like:
Before Text: The world is now affected with Corona and people are suffering severely.
After Text: The world is now affected with COVID where crores of people are suffering severely.

@sureshkap
MS Word change tracking mechanism does not have a revision with RevisionType.Editing change type, only with RevisionType.Deletion and RevisionType.Insertion, this way MS Word tracks changes within a document. Aspose.Words only reads this revision data from the document, and it is also allow to accept or cancel it.
The answer to your previous post may well be the code that I already posted above.

Document docBefore = doc.Clone(true) as Document;
docBefore.Revisions.RejectAll();
Console.WriteLine("Paragraph befor all changes");
Console.WriteLine(docBefore.FirstSection.Body.FirstParagraph.GetText().Trim());
Console.WriteLine();

docBefore = doc.Clone(true) as Document;
docBefore.Revisions.AcceptAll();
Console.WriteLine("Paragraph after all changes");
Console.WriteLine(docBefore.FirstSection.Body.FirstParagraph.GetText().Trim());
Console.WriteLine();

As a result, you will get the text before and the text after. However, as you understand, this is not a universal answer and it will not work in many cases, for example, if there are more than two revisions, or if some other change in the text takes place between inserting and deleting it, or if there is multiple deletion and insertion of text. Unfortunately, we cannot give you a universal answer for finding the before text and the after text, since this solution directly depends on your business logic, on what exactly you consider the before text and the after text. After defining these clear criteria, the solution to the issue will be to determine the list of several revisions that MS Word provides as is and that are atomic for your business logic. This functionality lies outside the scope of Aspose.Words tasks and is implemented by deep analysis of the text and those basic revisions that MS Word provides.

Is it something we cannot achieve Beforetext and Aftertext for a single sentence in a paragraph with multiple revisions like Insertion and Deletion?
The above code is showing the whole paragraph,but I want the solution as in a single sentence if there are 2 insertions and 2 deletions then need to show the before text as previous sentence before insertion and deletion.
Aftertext shall be which was added after deletion which is a new sentence.

Ex Sentence: It conveys the idea that we should always think and then act accordingly.
If think and then act accordingly was deleted then before text would be,
Before Text: It conveys the idea that we should always think and then act accordingly.
If act with 55% is the inserted words then after text would be
After Text: It conveys the idea that we should always act with 55%.
Is this possible with revisions to track?

@sureshkap

There is no such structural node as a sentence neither in MS Word nor in Aspose.Words. There is a Paragraph, and it consists of Runs. One Run may consist of one character, or it may contain several sentences. You can easily correlate a revision with a Paragraph. In order to determine the sentences contained in the paragraph, you need to parse the paragraph text into sentences. However, this is not always possible, since the sentence can start in one Paragraph and end in another. It is also not always possible to correlate a revision with a sentence, for example, two sentences can be deleted in one revision, and three sentences can be added in another one.

To make the conversation more substantive, could you please create a document in MS Word containing the revisions you describe and attach it here in the topic. It will serve as the source data for the discussed code.

Hi Team,

Please find the sample.docx attached as the source data for the above discussion.
Sample.docx (14.3 KB)

@sureshkap Please consider the following code:

Document doc = new Document("Sample.docx");
// All revision groups in order of their occurrence.
List<RevisionGroup> allGroups = new List<RevisionGroup>();
// Revision groups that we accept.
List<RevisionGroup> acceptGroups = new List<RevisionGroup>();
// Document Before.
Document docBefore = doc.Clone(true) as Document;
docBefore.Revisions.RejectAll();
string textBefore = docBefore.FirstSection.Body.FirstParagraph.GetText().Trim();

// Arrange revision groups in order of their occurrence.
foreach (Revision revision in doc.Revisions)
{
    if (!allGroups.Contains(revision.Group))
        allGroups.Add(revision.Group);
}

foreach (RevisionGroup group in allGroups)
{
    acceptGroups.Add(group);
    Document docAfter = doc.Clone(true) as Document;
    // Getting the next After document.
    for (int j = 0; j <= doc.Revisions.Count - 1; j++)
        if (acceptGroups.Contains(doc.Revisions[j].Group))
            docAfter.Revisions[0].Accept();
    docAfter.Revisions.RejectAll();
    // Splitting into sentences.
    string textAfter = docAfter.FirstSection.Body.FirstParagraph.GetText().Trim();
    string[] sentencesAfter = Regex.Split(textAfter, @"(?<=[\.])\s+");
    string[] sentencesBefore = Regex.Split(textBefore, @"(?<=[\.])\s+");
    string[] checkTarget = (group.RevisionType == RevisionType.Deletion)
        ? sentencesBefore
        : sentencesAfter;
    // Identify and display the sentence that has been changed.
    for (int j = 0; j < sentencesAfter.Length; j++)
    {
        if (checkTarget[j].Contains(group.Text))
        {
            Console.WriteLine("Revision: " + group.RevisionType);
            Console.WriteLine("Before: " + sentencesBefore[j]);
            Console.WriteLine("After: " + sentencesAfter[j]);
            Console.WriteLine();
            break;
        }
    }

    textBefore = textAfter;
}

Thanks for your quick reply and it is working fine for one paragraph in the word document. What if we have more than 1 paragraph in the word document with and without revisions?
I tried for looping the paragraph collection but how we can get the second or third paragraph text?Sample_Updated.docx (15.6 KB)

Attached sample_updated.docx for reference.

@sureshkap

You can get Paragraph from Revision

Run run = revision.ParentNode as Run;
paragraph = run.ParentParagraph;
doc.FirstSection.Body.Paragraphs[1].GetText()
doc.FirstSection.Body.Paragraphs[2].GetText()