Find Text in Word DOCX Document using regex & Replace it with Hyperlink using C# .NET | Find Replace Direction Backward

amanja · May 4, 2020, 10:48pm

I am trying to find text in docx file that match a regex(any combination of strings and numbers with “/” such as Link/Link/1/1 ) and replace all matches with a hyperlink, I am using the following code but always getting error with the split run function

Error: startIndex cannot be larger than length of string.
Parameter name: startIndex

public void addhyperlinks(string source)
{
Document doc = new Document(source);
FindReplaceOptions options = new FindReplaceOptions();
options.ReplacingCallback = new ReplaceWithHyperlinkEvaluator();
doc.Range.Replace(new Regex(@"[A-Za-z]+/[/A-Za-z0-9]+"),string.Empty, options);
doc.Save(source);
}

internal class ReplaceWithHyperlinkEvaluator : IReplacingCallback
{
    string link="www.google.com/";
    
    //public FindAndInsertHyperlink(string text, string link)
   
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)

    {
       

        // This is a Run node that contains either the beginning or the complete match.

        Node currentNode = args.MatchNode;


        // The first (and may be the only) run can contain text before the match,

        // in this case it is necessary to split the run.

        if (args.MatchOffset > 0)
        { currentNode = SplitRun((Run)currentNode, args.MatchOffset); }
        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();
        // Find all runs that contain parts of the match string.
        int remainingLength = args.Match.Value.Length;
        while (

        (remainingLength > 0) &&

        (currentNode != null) &&

        (currentNode.GetText().Length <= remainingLength))

        {

            runs.Add(currentNode);

            remainingLength = remainingLength - currentNode.GetText().Length;


            // Select the next Run node.

            // Have to loop because there could be other nodes such as BookmarkStart etc.

            do

            {

                currentNode = currentNode.NextSibling;

            }

            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));

        }


        // Split the last run that contains the match if there is any text left.

        if ((currentNode != null) && (remainingLength > 0))

        {

            SplitRun((Run)currentNode, remainingLength);

            runs.Add(currentNode);

        }



        // Create DocumentBuilder and move its cursor to the match node.

        DocumentBuilder builder = new DocumentBuilder((Document)args.MatchNode.Document);

        builder.MoveTo((Node)runs[0]);


        // Insert Hyperlink

        builder.Font.StyleIdentifier = StyleIdentifier.Hyperlink;

        builder.InsertHyperlink(args.Match.Value, link + args.Match.Value, false);


        // Now remove all runs in the sequence.

        foreach (Run run in runs)

            run.Remove();
        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }


    ///

    /// Splits text of the specified run into two runs.

    /// Inserts the new run just after the specified run.

    ///

    private static Run SplitRun(Run run, int position)

    {

        Run afterRun = (Run)run.Clone(true);

        afterRun.Text = run.Text.Substring(position);

        run.Text = run.Text.Substring(0, position);

        run.ParentNode.InsertAfter(afterRun, run);

        return afterRun;

    }
}

awais.hafeez · May 5, 2020, 7:06am

@amanja,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

Your simplified input Word document
Aspose.Words 20.5 generated output DOCX file showing the undesired behavior
Your expected DOCX file showing the desired output. You can create this document by using MS Word.

As soon as you get these pieces of information ready, we will start investigation into your scenario and provide you more information. Thanks for your cooperation.

amanja · May 5, 2020, 7:53am

Thanks for your response
Please find attached the input file you requested HyperlinkError.docx and the expected result HyperlinkError - Result.docx
When trying to find all text that matches my regex in HyperlinkError.docx I am getting an error
Error: startIndex cannot be larger than length of string.
Parameter name: startIndex

Demo.zip (35.7 KB)

awais.hafeez · May 5, 2020, 12:44pm

@amanja,

Please either use the FindReplaceOptions options = new FindReplaceOptions(FindReplaceDirection.Backward); constructor overload or try specifying the direction using the following C# code:

Document doc = new Document("E:\\Temp\\Demo\\HyperlinkError.docx");
FindReplaceOptions options = new FindReplaceOptions();
options.Direction = FindReplaceDirection.Backward;
options.ReplacingCallback = new ReplaceWithHyperlinkEvaluator();
doc.Range.Replace(new Regex(@"[A-Za-z]+/[/A-Za-z0-9]+"), "", options);
doc.Save("E:\\Temp\\Demo\\20.5.docx");

Hope, this helps.

amanja · May 5, 2020, 1:20pm

Great I tested fast it works great I will do more tests and will come back to you in case

Thanks again

awais.hafeez · May 6, 2020, 4:04am

@amanja,

Thanks for your feedback. Please let us know any time you may have any further queries in future.