Free Support Forum - aspose.com

Find & Encapsulate Word Document Text Spanning across Single or Multiple Paragraphs within Content Control C# .NET

How to find and add content control on document text? A given text can be a part of single or multiple paragraphs.

Please find attached input and expected documentFiles.zip (32.9 KB)

@Kunal19,

Thanks for your inquiry. Could you please attach your input Word document and expected document here for our reference? We will investigate the structure of your expected document as to how you want your final output be generated like. You can create expected document by using Microsoft Word. We will then provide you code to achieve the same using Aspose.Words.

Thanks for your quick response. Please find attached input and expected document files.

@Kunal19,

You can find and add content control on document text by using the following code:

Document doc = new Document("E:\\Temp\\Files\\InputDocument.docx");

FindReplaceOptions options = new FindReplaceOptions();
options.ReplacingCallback = new FindAndReplace();
options.Direction = FindReplaceDirection.Backward;

doc.Range.Replace("This Agreement is by and between XYZ Inc., a corporation under the laws of the state of Washington and having its principal place of business at 142711 Suite 100 Bellevue, WA (\"XYZ Inc.\") and Troy Inc.  (“Customer”), having a mailing address at 123 ABC Blvd, Building 2500, Dallas, USA 75001", "", options);
doc.Range.Replace("07/01/2015", "", options);
doc.Range.Replace("12/31/2026", "", options);
            
doc.Save("E:\\Temp\\Files\\19.11.docx");

private class FindAndReplace : IReplacingCallback
{
    /// <summary>
    /// NOTE: This is a simplistic method that will only work well when the match
    /// starts at the beginning of a run.
    /// </summary>
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
        builder.MoveTo((Run)runs[0]);

        StructuredDocumentTag sdt = new StructuredDocumentTag(builder.Document, SdtType.RichText, MarkupLevel.Inline);
        sdt.ChildNodes.Clear();
        builder.InsertNode(sdt);

        foreach (Run run in runs)
            sdt.AppendChild(run);

        return ReplaceAction.Skip;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring((0), (0) + (position));
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

Hope, this helps.

Thanks @awais.hafeez,

It works perfectly for single paragraph and single text but doesn’t work for multi paragraph text.
I want to tag 2 or more paragraphs together in one content control.
Could you please help me here?

Regards,
Kunal

@Kunal19,

You can build logic on the following code that makes multiple Paragraphs part of a Content Control:

Document doc = new Document("E:\\Temp\\Files\\InputDocument.docx");

Paragraph targetPara = null;
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ToString(SaveFormat.Text).StartsWith("General Liability Insurance on an occurrence "))
    {
        targetPara = para;
        break;
    }
}

if (targetPara != null)
{
    StructuredDocumentTag sdt = new StructuredDocumentTag(doc, SdtType.RichText, MarkupLevel.Block);
    sdt.ChildNodes.Clear();
    targetPara.ParentNode.InsertBefore(sdt, targetPara);

    sdt.AppendChild(targetPara);
    sdt.AppendChild(sdt.NextSibling);
    sdt.AppendChild(sdt.NextSibling);
}

doc.Save("E:\\Temp\\Files\\19.11.docx");

Hope, this helps.

Thanks @awais.hafeez.
Based on your answer,I have build a logic to tag multi paragraph text.

Another problem that I am facing now it that,below code is replacing all occurrence of given string(“07/01/2015”).
doc.Range.Replace(“07/01/2015”, “”, options);

I want to replace only if it is appearing in specific paragraph/position.

Could you please help me with problem.

Regards,
Kunal

@Kunal19,

Thanks for your inquiry. Please ZIP and upload your simplified input Word document and your expected DOCX file showing the desired output here for testing. You can create expected document by using MS Word. We will then investigate the scenario on our end and provide you more information.

@awais.hafeez,

Please find attached zip file for simplified input and expected docx file.Files.zip (27.5 KB)

@Kunal19,

The following code will produce an output similar to the “ExpectedDocument.docx” document you shared:

Document doc = new Document("E:\\Temp\\Files\\InputDocument.docx");

FindReplaceOptions options = new FindReplaceOptions();
options.ReplacingCallback = new FindAndReplace();
options.Direction = FindReplaceDirection.Forward;

doc.Range.Replace("07/01/2015", "", options);

doc.Save("E:\\Temp\\Files\\20.1.docx"); 

private class FindAndReplace : IReplacingCallback
{
    /// <summary>
    /// NOTE: This is a simplistic method that will only work well when the match
    /// starts at the beginning of a run.
    /// </summary>
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
        builder.MoveTo((Run)runs[0]);

        StructuredDocumentTag sdt = new StructuredDocumentTag(builder.Document, SdtType.RichText, MarkupLevel.Inline);
        sdt.ChildNodes.Clear();
        builder.InsertNode(sdt);

        foreach (Run run in runs)
            sdt.AppendChild(run);

        return ReplaceAction.Stop;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring((0), (0) + (position));
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

Hi @awais.hafeez,

Above code will tag all the occurrences of given string.I want to tag only one specific(May be based on its context).

Please find attached expected document.
“07/01/2015” this text is at 2 places in the document but I want to tag it only at one place.Files.zip (27.5 KB)

@Kunal19,

You will get the expected output if you please change the direction to forward options.Direction = FindReplaceDirection.Forward; in main code and inside IReplacingCallback.Replacing change the last line to return ReplaceAction.Stop;. The output produced on our end is attached here for your reference:

For complete code, please see my previous post. Hope, this helps.