Regex Pattern to Find Keywords in MS Word Document using C# .NET | Range Replace IReplacingCallback

I am doing evaluation on Aspose product for keyword searching capabilities. The requirement is to search the given keyword(s) in any type of MS office documents. I saw the Aspose.Words product has Replace function, is there a way to just do find without replace? Thanks!

@g.zhu,

You can use the same Range.Replace method(s) to locate keywords in Word document (without replace). Please see this sample input Word document and try running the following code:

C# code

Document doc = new Document("E:\\Temp\\input.docx");

ReplaceHandler handler = new ReplaceHandler();
FindReplaceOptions opts = new FindReplaceOptions();
opts.Direction = FindReplaceDirection.Backward;
opts.ReplacingCallback = handler;

string searchPattern = @"\[KW:([^\]]*)\]";
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
    para.Range.Replace(new Regex(searchPattern), "", opts);

int i = 1;
foreach (string str in handler.list)
    Console.WriteLine(i++ + ". " + str);

private class ReplaceHandler : IReplacingCallback
{
    public ArrayList list = new ArrayList();
    public ReplaceAction Replacing(ReplacingArgs e)
    {
        if (e.MatchNode.ParentNode.NodeType == NodeType.Paragraph)
        {
            // Paragraph para = (Paragraph) e.MatchNode.ParentNode;
            string value = e.Match.Groups[0].Value.Trim();
            list.Add(value);
        }

        // Just Find but do not replace anything
        return ReplaceAction.Skip;
    }
}

You just need to implement IReplacingCallback interface to be able to instruct Aspose.Words for .NET API to skip replacement.

The code uses Regex to find keywords of the following format in Word document:

  • [KW:...anything here...]

Hope, this helps in achieveing what you are looking for.