[Words Java] Tracking changes Marking up a piece of text rather than replacing

We have the following code snippet that normalizes a search string, finds it within the document and then replaces it. This all happens with track changes enabled at the start of the loop and stopped at the end of the loop. What ends up happening however is, that it replaces the block entirely since the replacement text also contains the full text and thus now it appears that the whole block/paragraph was replaced rather than a portion of it was added to the original block or removed from the original block.

How do we address this issue as we require the document to be markedup with changes rather than replaced with the changes.

 public String processSteps(List<Step> steps, Metadata metadata) throws Exception {
    String s3Url = metadata.getS3BinaryUrl();


    // Download from S3 and load the document into Aspose.Words
    Document document = s3Service.downloadAndPrepareDocument(s3Url);

    // Get paragraphs as a parameterized collection
    NodeCollection<Paragraph> paragraphs = document.getChildNodes(NodeType.PARAGRAPH, true);

    // Iterate through each step and replace text in paragraphs
    for (Step step : steps) {
      if (step.getLocations() != null && step.getOriginalProvision() != null
          && step.getMarkedUpProvision() != null) {
        logger.info("Processing step: {}", step.getProvisionName());
        String searchText = step.getOriginalProvision();
        if (searchText == null || searchText.isEmpty()) {
          logger.warn("Skipping step '{}' with empty search text.", step.getProvisionName());
          continue; // Skip if search text is empty
        }
        String replacementText = step.getMarkedUpProvision();
        // String regex = buildFlexibleRegex(searchText);
        String normalizedSearch = normalizeText(searchText);
        for (Paragraph para : paragraphs) {
          String paraText = para.getText().trim();
          String normalizedParaText = normalizeText(paraText);
          // Check if the normalized paragraph text contains the normalized search text
          if (normalizedParaText.contains(normalizedSearch)) {
            FindReplaceOptions options = new FindReplaceOptions();
            options.setMatchCase(false);
            options.setUseSubstitutions(true);
            logger.info("Text to be replaced: {}", searchText);
            logger.info("Para before replacement: {}", para.getText());
            // Replace the text in the paragraph
            para.getRange().replace(searchText, replacementText, options);
            logger.info("Para after replacement: {}", para.getText());
          }
        }
      }
    }

Servient developer on behalf of Ian.

@ianjwilson

Cause

The issue arises because when you replace text in a paragraph with the full replacement text, it effectively replaces the entire block instead of marking up the changes. This occurs because the replacement text includes the original text, leading to the appearance of a complete replacement rather than an addition or removal.

Solution

To achieve the desired markup of changes without replacing the entire block, you can use the DocumentBuilder class to insert or remove text at specific positions instead of using the replace method directly. This approach allows you to modify the text while keeping track of changes.

Code

Here’s a modified version of your code that uses DocumentBuilder to insert or remove text:

public String processSteps(List<Step> steps, Metadata metadata) throws Exception {
    String s3Url = metadata.getS3BinaryUrl();
    Document document = s3Service.downloadAndPrepareDocument(s3Url);
    NodeCollection<Paragraph> paragraphs = document.getChildNodes(NodeType.PARAGRAPH, true);

    for (Step step : steps) {
        if (step.getLocations() != null && step.getOriginalProvision() != null && step.getMarkedUpProvision() != null) {
            logger.info("Processing step: {}", step.getProvisionName());
            String searchText = step.getOriginalProvision();
            if (searchText == null || searchText.isEmpty()) {
                logger.warn("Skipping step '{}' with empty search text.", step.getProvisionName());
                continue;
            }
            String replacementText = step.getMarkedUpProvision();
            String normalizedSearch = normalizeText(searchText);

            for (Paragraph para : paragraphs) {
                String paraText = para.getText().trim();
                String normalizedParaText = normalizeText(paraText);
                if (normalizedParaText.contains(normalizedSearch)) {
                    logger.info("Text to be modified: {}", searchText);
                    logger.info("Para before modification: {}", para.getText());

                    // Use DocumentBuilder to insert or remove text
                    DocumentBuilder builder = new DocumentBuilder(document);
                    builder.moveTo(para);
                    int index = paraText.indexOf(searchText);
                    if (index != -1) {
                        // Insert the replacement text at the found index
                        builder.insertHtml(replacementText, true);
                        logger.info("Para after modification: {}", para.getText());
                    }
                }
            }
        }
    }
    return document.toString(SaveFormat.TEXT);
}

Next Steps

  1. Test the modified code to ensure it marks up changes correctly without replacing entire blocks.
  2. Adjust the logic for inserting or removing text based on your specific requirements for markup.
  3. Review the Aspose.Words documentation for further customization options related to tracking changes.

Please implement these changes and let me know if you need any further assistance!

// Use DocumentBuilder to insert or remove text
DocumentBuilder builder = new DocumentBuilder(document);
builder.moveTo(para);

this creates a new builder for every iteration inside the loop? is that performant?

@ianjwilson Could you please attach your sample input, current and expected output documents here for our reference? We will check your documents and provide you more information.

LOOK AT POINT 7 IN THE ATTACHED INPUT DOCUMENT
sample input to steps

{
    "steps": [
        {
            "provisionName": "non_contact",
            "locations": [],
            "originalProvision": "For a period of fifteen (15) days from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction.",
            "markedUpProvision": "For a period of six (6) months from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction. Notwithstanding the foregoing, Recipient may: (i) make contact and communications in the contact made in the ordinary course of business without use of the Confidential Information; (ii) conduct commercial or market due diligence in connection with the Potential Transaction on a no-names basis; or (iii) contact with Recipient's Representatives (to the extent acting in their capacity as such).",
            "explanation_of_changes": "The following changes were made to comply with the markup instructions:\nINSTRUCTION 01: Added \"for a period of six (6) months from the date hereof\" to establish a time limitation of six months.\nINSTRUCTION 02: Added \"any person know by Recipient to be\" before the reference to employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors to include a knowledge qualifier.\nINSTRUCTION 04: Added \"current\" at the beginning of the reference to employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors.\nINSTRUCTION 03: Added \"in connection with the Potential Transaction\" to the provision.\nINSTRUCTION 07: Added the standard carve-outs to allow for ordinary course of business contact, no-names due diligence, and contact with Recipient's Representatives."
        }]
}

and the input and output documents are attached here, now the output document does not show the tracked changes as how it would show up on word but it has the paragraph of what would happen if the particular pargraph changed. This change should ideally be reflected as markup using track changes where only the elements/words that were changed show up with the markup style of red/green and strikethroughs and such.
input.docx (6.9 KB)

output.docx (7.1 KB)

-Ashwin

@ianjwilson You can use Document.startTrackRevisions and Document.stopTrackRevisions method to get the expected output:

String searchText = "For a period of fifteen (15) days from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction.";
String replacement = "For a period of six (6) months from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction. Notwithstanding the foregoing, Recipient may: (i) make contact and communications in the contact made in the ordinary course of business without use of the Confidential Information; (ii) conduct commercial or market due diligence in connection with the Potential Transaction on a no-names basis; or (iii) contact with Recipient's Representatives (to the extent acting in their capacity as such).";
        
Document doc = new Document("C:\\Temp\\in.docx");
        
doc.startTrackRevisions("AW");
doc.getRange().replace(searchText, replacement);
doc.stopTrackRevisions();
        
doc.save("C:\\Temp\\out.docx");

out.docx (10.8 KB)

Yes however, if you see closely, the entire paragraph is now highlighted as changed. To the viewer it appears as though the entire paragraph was changed and leaves the onus on the user to determine that it was already there and only some part/s of the paragraph actually changed. This adds cognitive load on the user who reviews hundreds/thousands of documents weekly. This would mean they would have to go through the whole document anyway, barring the need for the software.

We went with the compare approach which I initially overlooked thinking it simply compared and returned the compares as a class object rather than an ability to save the compared document.

So something like this works for us

package com.local.abc;

import com.aspose.words.*;
import java.util.Date;

public class Compare {

    public static void main(String[] args) throws Exception {
        // Load Aspose.Words license
        License license = new License();
        license.setLicense("");

        // Load the original and edited documents
        Document docOriginal = new Document("input.docx");
        Document docEdited = new Document("output.docx");

        // Compare the documents
        docOriginal.compare(docEdited, "abc", new Date());

        // Save the result; docOriginal now contains tracked changes
        docOriginal.save("compared_with_revisions.docx");
    }
}

I appreciate your quick help. Thanks a lot.
Ashwin
compared_with_revisions.docx (10.9 KB)

@ianjwilson Yes, to get the expected output you can use compare document functionality:

String searchText = "For a period of fifteen (15) days from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction.";
String replacement = "For a period of six (6) months from the date hereof. Recipient may contact any person know by Recipient to be current employees, directors, officers, equity holders, agents, advisors, suppliers, customers, distributors or licensors in connection with the Potential Transaction. Notwithstanding the foregoing, Recipient may: (i) make contact and communications in the contact made in the ordinary course of business without use of the Confidential Information; (ii) conduct commercial or market due diligence in connection with the Potential Transaction on a no-names basis; or (iii) contact with Recipient's Representatives (to the extent acting in their capacity as such).";
    
Document original = new Document("C:\\Temp\\in.docx");
Document doc = (Document)original.deepClone(true);
doc.getRange().replace(searchText, replacement);
original.compare(doc, "AW", new Date());
original.save("C:\\Temp\\out.docx");