Problem with range replace with document that has revisions

princepushparaj · January 9, 2012, 5:08am

Hi,

I’m trying to find a word and highlight it, but the document has tracked changes so the replace does not work properly. For example I am trying to find “the” and in the document there is the text “thhe”. The extra “h” is the tracked change and this means that I cannot find and highlight this text.

Also I cannot accept these revisions so I can’t use doc.AcceptAllRevisions.

Thanks,

This message was posted using Email2Forum by aske012.

adam.skelton · January 9, 2012, 5:27am

Hi there,

Thanks for your inquiry.

I understand what you mean but I’m afraid it’s a little hard to achieve since you do not want to accept these revisions. That means such revisions are still “a part” of the document. You can however try using the following work around code to attempt to run a range replace in this type of situation.

// Create a dummy document that we can accept revisions in and execute the range replace on.
Document cloneDoc = doc.Clone();
Dictionary<Node, Node> nodeLookup = new Dictionary<Node, Node>();
NodeCollection allDocNodes = doc.GetChildNodes(NodeType.Any, true);
NodeCollection allCloneNodes = cloneDoc.GetChildNodes(NodeType.Any, true);
// We create a look up between the two documents so we can find the same nodes in the original document.
for (int i = 0; i < allDocNodes.Count; i++)
    nodeLookup.Add(allCloneNodes[i], allDocNodes[i]);
cloneDoc.AcceptAllRevisions();
// Run the replace on the clone document but use the lookup to apply the highlighting to the runs in original document with
// the revisions still there.
Regex regex = new Regex("the", RegexOptions.IgnoreCase);
cloneDoc.Range.Replace(regex, new ReplaceEvaluatorFindAndHighlight(nodeLookup), true);

public class ReplaceEvaluatorFindAndHighlight : IReplacingCallback
{
    public ReplaceEvaluatorFindAndHighlight(Dictionary<Node, Node> lookup)
    {
        mNodeLookup = lookup;
    }

    public ReplaceAction Replacing(ReplacingArgs args)
    {
        // .... all code up to here as usual.
        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }
        // Look up the real start and end node to be highlighted from the clone document node.
        Run startRun = (Run)mNodeLookup[(Node)runs[0]];
        Run endRun = (Run)mNodeLookup[(Node)runs[0]];
        currentNode = startRun;
        // Now highlight all runs in the sequence.
        while (currentNode != endRun)
        {
            if (currentNode.NodeType == NodeType.Run)
                ((Run)currentNode).Font.HighlightColor = Color.Yellow;
            currentNode = currentNode.NextSibling;
        }
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    Dictionary<Node, Node> mNodeLookup;
}

Please let me know how this goes for you.

Thanks,

princepushparaj · January 24, 2012, 10:58pm

Thanks for your suggestion.

But I could not implement this. It is showing “keys not present” error in highlighted part.

Run startRun = (Run)mNodeLookup[(Node)runs[0]];
Run endRun = (Run)mNodeLookup[(Node)runs[0]];
currentNode = startRun;
// Now highlight all runs in the sequence.
while (currentNode != endRun)
{
    if (currentNode.NodeType == NodeType.Run)
        ((Run)currentNode).Font.HighlightColor = Color.Yellow;
    currentNode = currentNode.NextSibling;
}

Please could you help me out.

Thanks

Prince A

adam.skelton · January 26, 2012, 7:14am

Hi there,

Thanks for your inquiry and sorry for the delay.

I’m afraid I wasn’t able to reproduce that error on my side. Please make sure you added the for loop which adds the nodes to the list from both documents.

I also found a few bugs in the code I posted above. I have highlighted the fixes in the new code below.

for (int i = 0; i < allDocNodes.Count; i++)
    nodeLookup.Add(allCloneNodes[i], allDocNodes[i]);
Node[] runs = cloneDoc.GetChildNodes(NodeType.Run, true).ToArray();
foreach (Run run in runs)
{
    if (run.IsInsertRevision || run.IsDeleteRevision)
        run.Remove();
}
Regex regex = new Regex("SomeText", RegexOptions.IgnoreCase);
Run startRun = (Run)mNodeLookup[(Node)runs[0]];
Run endRun = (Run)mNodeLookup[(Node)runs[runs.Count - 1]];
currentNode = startRun;
// Now highlight all runs in the sequence.
while (currentNode != endRun)
{
    if (currentNode.NodeType == NodeType.Run)
        ((Run)currentNode).Font.HighlightColor = Color.Yellow;
    currentNode = currentNode.NextSibling;
}
endRun.Font.HighlightColor = Color.Yellow;

If you are still running into this issue then can you please attach your input document and the string you are replacing here for further testing?

Thanks,

princepushparaj · January 27, 2012, 6:22am

Hi. Thanks for your reply.

Still the problem exists. I could not able to find the pattern “(\bthe\b|\bsignals\b|conclusions)” for the attached document.

Please help me on this.

Thanks,

Prince

princepushparaj · January 31, 2012, 12:01am

Thanks for your support.

I had found a way for this issue. Have a look at the highlighted part and let me if I am wrong.

private class ReplaceEvaluatorFindAndHighlight_For_Revisions : IReplacingCallback
{
    public ReplaceEvaluatorFindAndHighlight_For_Revisions(Dictionary<Node, Node> lookup, Document OrginalDoc)
    {
        mNodeLookup = lookup;
        OrgDoc = OrginalDoc;
    }

    Document OrgDoc = null;
    Dictionary<Node, Node> mNodeLookup;
            
    /// 
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// This method highlights the match string, even if it spans multiple runs.
    /// 
    Node MatchedNode = null; int OffsetAdjustment = 0;
    Node tmpMatchNode = null;

    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = null;
        if (MatchedNode != e.MatchNode || tmpMatchNode.NodeType != NodeType.Run)
        {
            MatchedNode = e.MatchNode;
            OffsetAdjustment = 0;
            tmpMatchNode = mNodeLookup[e.MatchNode];
        }
        int MatchedOffset = 0;
        MatchedOffset = e.MatchOffset - OffsetAdjustment;
        NodeCollection OrgDocNodes = OrgDoc.GetChildNodes(NodeType.Any, true, true);
        int index = OrgDocNodes.IndexOf(tmpMatchNode);
        currentNode = OrgDocNodes[index];// e.MatchNode;
        if (MatchedOffset > 0)
            currentNode = SplitRun((Run)currentNode, MatchedOffset, ref OffsetAdjustment);

        ArrayList runs = new ArrayList();
        int remainingLength = e.Match.Value.Length;
        while (
        (remainingLength > 0) &&
        (currentNode != null) &&
        (currentNode.GetText().Length <= remainingLength))
        {
            if (!((Run)currentNode).IsDeleteRevision)
            {
                runs.Add(currentNode);
                remainingLength = remainingLength - currentNode.GetText().Length;
            }
            do
            {
                currentNode = currentNode.NextSibling; OffsetAdjustment = 0;
                var querynode = from nodKys in mNodeLookup where nodKys.Value == currentNode select nodKys.Key;
                if (querynode.LongCount() > 0)
                    MatchedNode = querynode.First();
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }
        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength, ref OffsetAdjustment);
            runs.Add(currentNode);
        }
        Run startRun = null; Run endRun = null;
        // Look up the real start and end node to be highlighted from the clone document node.
        if (mNodeLookup.ContainsKey((Node)runs[0]))
        {
            startRun = (Run)mNodeLookup[(Node)runs[0]];
            endRun = (Run)mNodeLookup[(Node)runs[runs.Count - 1]];
        }
        else
        {
            startRun = (Run)runs[0]; endRun = (Run)runs[runs.Count - 1];
        }
        currentNode = startRun;
        // Now highlight all runs in the sequence.
        while (currentNode != endRun.NextSibling)
        {
            if (currentNode.NodeType == NodeType.Run)
            {
                if (!((Run)currentNode).IsDeleteRevision)
                {
                    ((Run)currentNode).Font.HighlightColor = Color.Yellow;
                }
            }
            if (currentNode.NextSibling != null)
            {
                currentNode = currentNode.NextSibling;
            }
            else
                break;
        }
        tmpMatchNode = currentNode;
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }
}

/// 
/// Splits text of the specified run into two runs.
/// Inserts the new run just after the specified run.
/// 
public static Run SplitRun(Run run, int position, ref int adjustedIndex)
{
    Run afterRun = (Run)run.Clone(true);
    afterRun.Text = run.Text.Substring(position);
    run.Text = run.Text.Substring(0, position);
    adjustedIndex += run.Text.Length;
    run.ParentNode.InsertAfter(afterRun, run);
    return afterRun;
}

adam.skelton · January 31, 2012, 5:43pm

Hi there,

Thanks for this additional information.

It appears your code might be doing the right thing, however since I was unable to reproduce any issue with my test document I cannot be sure what the problem is. If you are still having any problems could you please attach a template which demonstrates the issue you are having here?

Thanks,

iEditor · June 5, 2013, 6:02am

I’d also like to use find and replace on documents that have revisions. I’ve tried everything above. However, I’m getting the same error about the key not present in the dictionary. Could it be to do with using the evaluation version of Aspose while I test feasability? Attached is the sample document and here’s the code.

Thanks,

Daniel

public void CheckRevisions()
{
    // Create a dummy document that we can accept revisions in and execute the range replace on.
    Document cloneDoc = doc.Clone();
    Dictionary<Node, Node> nodeLookup = new Dictionary<Node, Node>();
    NodeCollection allDocNodes = doc.GetChildNodes(NodeType.Any, true);
    NodeCollection allCloneNodes = cloneDoc.GetChildNodes(NodeType.Any, true);

    // We create a look up between the two documents so we can find the same nodes in the original document.
    for (int i = 0; i < allDocNodes.Count; i++)
        nodeLookup.Add(allCloneNodes[i], allDocNodes[i]);

    //Use this instead of accept all revs
    Node[] runs = cloneDoc.GetChildNodes(NodeType.Run, true).ToArray();
    foreach (Run run in runs)
    {
        if (run.IsInsertRevision || run.IsDeleteRevision)
            run.Remove();
    }

    // Run the replace on the clone document but use the lookup to apply the highlighting to the runs in original document with
    // the revisions still there.
    Regex regex = new Regex("the", RegexOptions.IgnoreCase);
    cloneDoc.Range.Replace(regex, new ReplaceEvaluatorFindAndHighlight(nodeLookup), true);
}

public class ReplaceEvaluatorFindAndHighlight : IReplacingCallback
{

    public ReplaceEvaluatorFindAndHighlight(Dictionary<Node, Node> lookup)
    {
        mNodeLookup = lookup;
    }

    public ReplaceAction Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further highlighting.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
        (remainingLength > 0) &&
        (currentNode != null) &&
        (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node. 
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.

        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        // Look up the real start and end node to be highlighted from the clone document node.
        Run startRun = (Run)mNodeLookup[(Node)runs[0]]; //I GET THE ERROR HERE
        Run endRun = (Run)mNodeLookup[(Node)runs[runs.Count - 1]];

        currentNode = startRun;

        // Now highlight all runs in the sequence.
        while (currentNode != endRun)
        {
            if (currentNode.NodeType == NodeType.Run)
                ((Run)currentNode).Font.Bold = true;

            currentNode = currentNode.NextSibling;
        }

        endRun.Font.Bold = true;

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    Dictionary<Node, Node> mNodeLookup;

    /// 

    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    /// 

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

tahir.manzoor · June 10, 2013, 6:12am

Hi Daniel,

Please accept my apologies for late response.

Thanks for your inquiry. As per your shared code, you want to find and bold the specific text. Please use the code shared at following documentation link.

https://docs.aspose.com/words/net/find-and-replace/

Please only replace the following line of code with Run.Font.Bold = true as highlighted below. Hope this helps you. Please let us know if you have any more queries.

run.Font.HighlightColor = Color.Yellow;

run.Font.Bold = true;

iEditor · June 12, 2013, 8:31am

Hi Tahir,

I think you’ve misunderstood my question. The issue has nothing to do with highlighting or bold (that is a leftover from a previous query which you answered really well). The issue I’m encountering here is in how to work with a document that has revisions. I’m getting an error in the lookup element.

Is there any chance you could check over my post again. It’s the one in this thread from 06-05-2013, 12:02 PM. The line where I get an error is marked ‘I GET THE ERROR HERE’. It seems like I’m encountering a similar problem to the user ‘Chennai’ in the posts above. Please do let me know if you have any ideas.

Best wishes,

Daniel

tahir.manzoor · June 12, 2013, 10:37am

Hi Daniel,

Thanks for sharing the detail. Please note that all text of the document is stored in runs of text. In your document, each line of text is in a single Run node. The exception you are facing is due to SplitRun method. This method split the single Run node.

In ReplaceAction, the SplitRun method is called which split the single Run node multiple. E.g the input document has Run node with text ‘This is the age of the train.’. After the execution of ReplaceAction, there are four Run nodes in the Aspose.Words DOM. Please see the attached DOM image for detail. The run[0] does not exist in mNodeLookup. This is the reason of exception.

// Look up the real start and end node to be highlighted from the clone document node.
Run startRun = (Run)mNodeLookup[(Node)runs[0]]; //I GET THE ERROR HERE
Run endRun = (Run)mNodeLookup[(Node)runs[runs.Count - 1]];

Secondly, your document does not contains the revisions. I have added the insert and delete revisions in the attached document. I have used the following code snippet to find the text ‘the’ and bold it by using following code snippet and have not found any with ‘Find and Replace’.

Could you please attach your input and expected output Word document here for testing? I will investigate the issue on my side and provide you more information.

Please also share what exact you want to achieve by using Aspose.Words. We will then provide you more information about your query along with code.

Document doc = new Document(MyDir + "Revision.docx");
Regex regex = new Regex("the", RegexOptions.IgnoreCase);
doc.Range.Replace(regex, new ReplaceEvaluatorFindAndHighlight(), true);
doc.Save(MyDir + "Out.docx");

private class ReplaceEvaluatorFindAndHighlight : IReplacingCallback
{
    /// 
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// This method highlights the match string, even if it spans multiple runs.
    /// 
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;
        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);
        // This array is used to store all nodes of the match for further highlighting.
        ArrayList runs = new ArrayList();
        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
        (remainingLength > 0) &&
        (currentNode != null) &&
        (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;
            // Select the next Run node. 
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }
        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }
        // MessageBox.Show(((Run)currentNode).Text);
        // Now highlight all runs in the sequence.
        foreach (Run run in runs)
            run.Font.Bold = true;
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }
}

iEditor · June 18, 2013, 6:58am

Hi Tahir,

Thanks for the explanation. I think we’re getting near to an answer.

Attached

is a revised document to make things clearer. With the attached, I’d like
to have a find function that puts the word ‘the’’ in bold every time
that it appears. The word I’m looking for will vary in my final function
which is designed to check a variety of documents.

As you can
see, in the attached document the word ‘the’ appears six times. However, a typical find function
will only find five of these because of the revisions. If we accept all
changes in the document first then the find will get all six. I’d like to get all six without accepting changes.

Is this possible? Is there a way of making that work using the lookup? How would I need to modify my code above?

Best wishes,

Daniel

tahir.manzoor · June 19, 2013, 4:45am

Hi Daniel,

Thanks for your inquiry. The find and replace code shared in my last post does not match the text tahe because the following Regex match only Run nodes which contain the text ‘the’.

Regex regex = new Regex("the", RegexOptions.IgnoreCase);

In this case, I suggest you please remove the Run nodes having delete revisions from the document and after ‘Find and Replace’, insert the run’s text back to the document as shown in following code snippet. Hope this helps you.

Document doc = new Document(MyDir + "Find+ofs.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 0;
//Remove the Runs with IsDeleteRevision = true and add in Hashtable
Hashtable runs = new Hashtable();
foreach (Run run in doc.GetChildNodes(NodeType.Run, true).ToArray())
{
    if (run.IsDeleteRevision)
    {
        builder.MoveTo(run);
        builder.StartBookmark("bm_" + i);
        builder.EndBookmark("bm_" + i);
        runs.Add("bm_" + i, run);
        i++;
        run.Remove();
    }
}

Regex regex = new Regex("the", RegexOptions.IgnoreCase);
doc.Range.Replace(regex, new ReplaceEvaluatorFindAndHighlight(), true);
i = 0;
foreach (DictionaryEntry entry in runs)
{
    builder.MoveToBookmark(entry.Key.ToString());
    builder.Font.Color = Color.Blue;

    builder.Font.StrikeThrough = true;
    builder.Write(((Run)entry.Value).Text);
}

doc.Save(MyDir + "Out.docx");

iEditor · June 19, 2013, 5:32am

Hi Tahir,

That’s an interesting approach, thanks for the suggestion. However, adding text back in that is blue and strikethrough is not the same as having a revision. For example, you can’t get rid of it by accepting all revisions.

So two questions:
- Is there a way to change your approach so that instead of adding back the text with blue strikethrough, we add it back as a node with IsDeleteRevision set to true? If that’s possible then that would be a perfect solution.
- If that isn’t possible, is there a way to do this with the node lookup as originally described above?

Best wishes,

Daniel

tahir.manzoor · June 19, 2013, 11:51am

Hi Daniel,

Thanks for your inquiry.

princepushparaj:

- Is there a way to change your approach so that instead of adding back the text with blue strikethrough, we add it back as a node with IsDeleteRevision set to true? If that’s possible then that would be a perfect solution.

Unfortunately, you cannot insert/delete revisions using Aspose.Words. We had already logged this feature request as WORDSNET-754 in our issue tracking system. You will be notified via this forum thread once this feature is available.We apologize for your inconvenience.

princepushparaj:

- If that isn’t possible, is there a way to do this with the node lookup as originally described above?

In this case, the problem still exists which I explained here.