Html Replace

gareth86139 · February 13, 2017, 10:09am

Hi,

I am trying to test the ability of the software to perform multiple replaces in a word document.

Most are simple string replaces, however I am trying to work out how to do a HTML replace.

FindReplaceOptions options = new FindReplaceOptions();
options.ReplacingCallback = new ReplacerHtml();
Range.Replace(regex, htmlText, options);

Where obviously the htmlText contains html which is appearing as text directly in the word document.

Is there a way to set the replacement text as html in the ReplacingCallback.Replacing method?

Regards,
Gareth

tahir.manzoor · February 14, 2017, 2:01am

Hi Gareth,

Thanks for your inquiry. Please refer to the following article.

Find and Repalce

Following code example show how to find the text and replace it with Html. Hope this helps you.

// Open the document.

Document doc = new Document(MyDir + "in.docx");
FindReplaceOptions froption = new FindReplaceOptions();
string html = @"<P align='right'>Paragraph right</P>" +                "<b>Implicit paragraph left</b>" + 
                "<div align='center'>Div center</div>" +                 "<h1 align='left'>Heading 1 left.</h1>";
froption.ReplacingCallback = new ReplaceWithHtmlEvaluator(html);
doc.Range.Replace(new Regex(@"<replace this text with html>"), "", froption);
// Save the document.
doc.Save(MyDir + "Range.ReplaceWithInsertHtml Out.docx");

class ReplaceWithHtmlEvaluator : IReplacingCallback
{
    string newText;
    public ReplaceWithHtmlEvaluator(string passedString)
    {
        newText = passedString;
        if (newText == null)
            newText = "";
    }
    /// 
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// 
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;
        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);
        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();
        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
        (remainingLength > 0) &&
        (currentNode != null) &&
        (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }
        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }
        // Create instance of DocumentBuilder and insert document
        DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document as Document);
        builder.MoveTo((Run)runs[runs.Count - 1]);
        builder.InsertHtml(newText);
        // Now remove all runs in the sequence.
        foreach (Run run in runs)
            run.Remove();
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

gareth86139 · February 14, 2017, 3:37am

Thanks does what I asked, although it doesn’t quite maintain the formatting, but that’s something I can work on.

Gareth

tahir.manzoor · February 14, 2017, 10:06pm

Hi Gareth,

Thanks for your inquiry. You may use overload of DocumentBuilder.InsertHtml method (String html, Boolean useBuilderFormatting) to insert an HTML string into the document.

When useBuilderFormatting is false, DocumentBuilder formating is ignored and formatting of inserted text is based on default HTML formatting. As a result, the text looks as it is rendered in browsers.

When useBuilderFormatting is true, formatting of inserted text is based on DocumentBuilder formatting, and the text looks as if it were inserted with Write.

Hope this helps you. If you still face problem, please share your input, output and expected output documents here for our reference. We will then provide you more information about your query.