Search Text in Word Document using Regex Expression & Replace with HTML String (C# .NET)

Hi,


I have a current requirement to take HTML formatted text and enter it into a Word document. The Word document already contains text as placeholders, and it is this text which needs to be replaced.

E.g. the Word document might contain R31AQ and I need to replace it with something like

This text is bold and this text is underlined.

The replacement text will be fetched from a database and will be in standard HTML format e.g.

This text is bold and this text is underlined.

Is that something that Aspose can do?

Any assistance gratefully received.

Mark

Hello

Thanks for your request. I think in your case you can try using IReplacingCallback. Please see the following link for more information:

Also please see the following simple code:

// Open document.
Document doc = new Document("C:\\Temp\\in.doc");

Regex regex = new Regex("R31AQ", RegexOptions.IgnoreCase);

// Find and replace paragraph
doc.Range.Replace(regex, new ReplaceHandler(@"This text is bold and this text is underlined."), false);

// Save output document
doc.Save("C:\\Temp\\out.doc");
private class ReplaceHandler : IReplacingCallback
{
    public ReplaceHandler(string replacement)
    {
        mReplacement = replacement;
    }

    public ReplaceAction Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // Create DocumentBuilder object, which will help us to insert filds.
        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);

        // Move builder cursor to the current node.
        builder.MoveTo(currentNode);

        // Insert new text.
        builder.InsertHtml(mReplacement);

        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
        (remainingLength > 0) &&
        (currentNode != null) &&
        (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }

            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        // Now highlight all runs in the sequence.
        foreach (Run run in runs)
            run.Remove();

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    /// <summary>
    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    /// </summary>
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }

    private readonly string mReplacement;
}

Hope this helps.

Best regards,

Thanks very much for the reply. Looks like exactly what I require.


Will purchase Aspose.Words later today.

A post was split to a new topic: Find Keyword in Word document & Replace with HTML String using C# .NET