C# Word Document Find Text using Regex Pattern

Could you explain how Find and Replace can help to find specific text and return it? It will be more beneficial to get exact location for every occurrence.
According to the APIs and documentation there is only one function with 4 overloads - Document.getRange().replace(). This one returns only integer value - number of made replacements.

Please, elaborate on how can Find be performed without replacement - for instance, I have a regular expression and want to find all values, but replace every occurrence of unique value with new unique value.

For example - looking for name in the string “John and Mike were there. John and Michelle have bicycles. Mike, Roger, and Stephany play basketball.
regular expression picks names. Now I have list of names and their replacements:
John -> Oren
Mike --> Jim
Michelle -> Alice
Stephany -> Carol
Roger -> Jeff

Now, I can run replace to get “Oren and Jim were there. Oren and Alice have bicycles. Jim, Jeff, and Carol play basketball.

With current functionality, to achieve the same, I have to get whole text document.getText() and then implement the logic

@lion.brotzky,

For example, the following C# code of Aspose.Words for .NET API will use Regex pattern to find all digits in Word DOCX document, print their numbers or values on Console and then Bookmark them to be able to find their locations later on:

Document doc = new Document("C:\\Temp\\Word containing Digits.docx");

Regex regExpression = new Regex(@"\d+");
FindReplaceOptions findReplaceOptions = new FindReplaceOptions();
findReplaceOptions.Direction = FindReplaceDirection.Backward;

MyReplaceEvaluator replacer = new MyReplaceEvaluator();
findReplaceOptions.ReplacingCallback = replacer;

doc.Range.Replace(regExpression, "", findReplaceOptions);

ArrayList list = replacer.ListOfMatches;
foreach (string item in list)
    Console.WriteLine(item);

doc.Save("C:\\temp\\awnet-21.8.docx");

private class MyReplaceEvaluator : IReplacingCallback
{
    public ArrayList ListOfMatches = new ArrayList();
    private int index = 0;

    ///
    /// This is called during a replace operation each time a match is found.
    /// This method appends a number to the match string and returns it as a replacement string.
    ///
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }


        ListOfMatches.Add(e.Match.Value);

        string bookmarkName = "bm_" + index + "_" + e.Match.Value;
        index++;

        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
        builder.MoveTo((Run)runs[0]);

        builder.StartBookmark(bookmarkName);
        BookmarkEnd bookmarkEnd = builder.EndBookmark(bookmarkName);

        Run lastRun = (Run)runs[runs.Count - 1];
        lastRun.ParentNode.InsertAfter(bookmarkEnd, lastRun);

        return ReplaceAction.Skip;
    }

    ///
    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    ///
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

Please also check the following section of documentation: Find and Replace

Thank you. I thought about implementation of above in the Replacing Callback, but was not sure.

@lion.brotzky,

In case you have further inquiries or may need any help in future, please let us know by posting a new thread in Aspose.Words’ forum.