Range.Replace throws System.IndexOutOfRangeException when LegacyMode is false using .NET Core

Hi, Team!

The Range.Replace(String, String) method is throwing exception when I search with the .* regex. When I set the “LegacyMode” to true the exception is doesn’t occurs.
Could you check what causing this issue?

Exception message:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
at System.String.get_Chars(Int32 index)
at . (Match )
at (String , Int32 )
at . (Node )
at . ()
at Aspose.Words.Range.Replace(Regex pattern, String replacement, FindReplaceOptions options)

My other question is what should be the match if I searching with the .* regex? Because I expect that the Match.Value will be the whole document text content. But when I print the args.Match.Value in MyReplaceHandler callback function the match string value is empty.

What Node should I get in this situation? If I check the args.MatchNode.NodeType it is a Paragraph Node but it does not have any child Run. My main goal is to collect the match Run nodes and replace the content of those nodes but I cannot do that in this edge case because I doesn’t know the match text and the match first Run node. If I use the Legacy Mode then I will get the document first Run Node.

I attached the Word document and the project what I used so you can reproduce it easily.

Sample file:
sampleFile.zip (3.7 KB)

Attached .Net Core project:
AsposeWordReplaceException.zip (2.5 KB)

C# .Net Core 3.1
Aspose.Word 21.1.0

Here is the code sample:

    private class MyReplaceHandler : IReplacingCallback
    {
        ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
        {
            Console.WriteLine($"Replacement: {args.Replacement}");
            Console.WriteLine($"GroupIndex: {args.GroupIndex}");
            Console.WriteLine($"GroupName: {args.GroupName}");
            Console.WriteLine($"MatchNode.NodeType: {args.MatchNode.NodeType}");
            if (args.MatchNode.IsComposite)
            {
                var compositeNode = (CompositeNode) args.MatchNode;
                Console.WriteLine($"Run Child count: {compositeNode.GetChildNodes(NodeType.Run, true).Count}");
            }

            Console.WriteLine($"MatchNode.GetText(): {args.MatchNode.GetText()}");
            Console.WriteLine($"MatchOffset: {args.MatchOffset}");

            Console.WriteLine($"Match.Groups: {args.Match.Groups.Count}");
            Console.WriteLine($"Match.Captures.Count: {args.Match.Captures.Count}");
            Console.WriteLine($"Match.Length: {args.Match.Length}");
            Console.WriteLine($"Match.Index: {args.Match.Index}");
            Console.WriteLine($"Match.Name: {args.Match.Name}");
            Console.WriteLine($"Match.Success: {args.Match.Success}");
            Console.WriteLine($"Match.Value: {args.Match.Value}");
            return ReplaceAction.Skip;
        }
    }

    static void Main(string[] args)
    {
        var license = new License();
        const string pathToTheLicense = "pathToFile";
        license.SetLicense(pathToTheLicense);

        Document document = new Document("sample.docx");

        FindReplaceOptions options = new FindReplaceOptions
        {
            FindWholeWordsOnly = true,
            Direction = FindReplaceDirection.Backward,
            MatchCase = false,
            LegacyMode = false
        };

        MyReplaceHandler myReplaceHandler = new MyReplaceHandler();
        options.ReplacingCallback = myReplaceHandler;

        document.Range.Replace(new Regex(@".*"), "", options);
    }

Thanks,
Gabor

It seems like the FindWholeWordsOnly option causing the problem. If it’s true and the regex is .* then there is an exception.

@erdeiga

We have tested the scenario using the latest version of Aspose.Words for .NET 21.1 and have not found the shared issue. So, please use Aspose.Words for .NET 21.1.

Please make sure that you are using the latest version of Aspose.Words and same Word document.

Hi @tahir.manzoor!

I accidentally uploaded the project with the LegacyMode is turned on. Please, could you try it again with LegacyMode = false. Sorry about that. If you check the csproj file the Aspose.Word version is the latest.

I reupload the AsposeWordReplaceException.zip file.

@erdeiga

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-21639. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thanks for the quick response.

I have a question about the IReplacingCallback function. Could be the ReplacingArgs.MatchNode property a CompositeNode? If yes, can you tell me in which cases can it be?

@erdeiga

The ReplacingArgs.MatchNode property does not return CompositeNode. It returns Node object. You can use Node.ParentNode property to get the parent node that is Paragraph node.

Good to know, but then is this another bug?

With these FindReplaceOptions and the .* regex:

  • FindWholeWordsOnly = true
  • Direction = FindReplaceDirection.Backward
  • MatchCase = false

The args.MatchNode will be a composite node (Paragraph type) when you search in the sample.docx that I attached.

If you put this code snippet to the attached project then it will throw an InvalidCastException:

        ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
        {
            Run matchNode = (Run)args.MatchNode;
            return ReplaceAction.Skip;
        }

Unhandled exception. System.InvalidCastException: Unable to cast object of type ‘Aspose.Words.Paragraph’ to type ‘Aspose.Words.Run’.

@erdeiga

The ReplacingArgs.MatchNode property returns the node that contains the beginning of the match. You are facing the expected behavior of Aspose.Words. The ReplacingArgs.MatchNode property returns Paragraph node type due to regex used in the code new Regex(@".*"). This Regex matches any character in the document. If you use any specific text in the Range.Replace method, the MatchNode.NodeType will be Run node.

In your case, we suggest you please check the node type using MatchNode.NodeType property as shown below.

if (args.MatchNode.NodeType == NodeType.Run)
{
    // Your code...
}
else if (args.MatchNode.NodeType == NodeType.Paragraph)
{
    // Your code...
}
1 Like

@tahir.manzoor
There is another exception thrown that is related to the .* regex search.

Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter ‘index’)
at ​ . (Int32 )
at (Int32 )
at (String , Int32 )
at . (Node )
at . (Node )
at . ()
at Aspose.Words.Range.Replace(Regex pattern, String replacement, FindReplaceOptions options)

You can reproduce it with this FindReplaceOptions options and another_exception.docx file:

        FindReplaceOptions options = new FindReplaceOptions
        {
            FindWholeWordsOnly = false,
            Direction = FindReplaceDirection.Backward,
            MatchCase = false,
            LegacyMode = false
        };

        MyReplaceHandler myReplaceHandler = new MyReplaceHandler();
        options.ReplacingCallback = myReplaceHandler;

        document.Range.Replace(new Regex(@".*"), "", options);

another_exception.zip (31.4 KB)

@tahir.manzoor
Could you tell me what are the disadvantages of using Legacy mode?

I saw in the release notes that “old algorithm does not support advanced features such as replace with breaks, apply formatting and so on”.

We only use Document.Range.Replace to search with regex in the document and collect all of the match Nodes. We doesn’t use the replace capabilities of this function.

Do you recommend us to use LegacyMode until these issues are fixed? Is there any performance difference or any important features that affecting the searching mechanism?

@erdeiga

In your case, we suggest you please do not use FindReplaceOptions.LegacyMode property.

Please check the remarks of this property below:
Use this flag if you need exactly the same behavior as before advanced find/replace feature was introduced. Note that old algorithm does not support advanced features such as replace with breaks, apply formatting and so on.

Some customers want replace algorithm to work exactly as deprecated (old algorithm). So, this property was added for it. You are using the latest features of find and replace. There is no need to use this property.

1 Like

The issues you have found earlier (filed as WORDSNET-21639) have been fixed in this Aspose.Words for .NET 21.2 update and this Aspose.Words for Java 21.2 update.