Range.Replace with multiline Regex to detect data in word files

Hi Team,

We are trying to work out how to detect certain costumer numbers in word documents.
For that we have come up with the following regex:

@"(Customer Number|Customer No|Customer #|Customer#|CustomerID|Customerno|customernumber)\W*\d{9}\b"

which we try to use like this

int r = wordsDocument.Range.Replace(new Regex(@"(Customer Number|Customer No|Customer #|Customer#|CustomerID|Customerno|customernumber)\W*\d{9}\b", RegexOptions.None), “REPLACED”);

With RegexOptions.None this manages to locate the data across multiple lines even multiple cells of the same table.
Selection_113.png (688 Bytes)
Selection_114.png (11.7 KB)

However as soon as we add RegexOptions.Multiline as follows:

int r = wordsDocument.Range.Replace(new Regex(@"(Customer Number|Customer No|Customer #|Customer#|CustomerID|Customerno|customernumber)\W*\d{9}\b", RegexOptions.Multiline), “REPLACED”);

Aspose.Words does not detect data instances any more.

It is quite confusing why Multiline affects this regex, because it is supposed to change the behavior of regexes with ^ $ characters only.

Please help us understand this. Thank you.

FYI, I’ve attached an example application to demonstrate the above.

CustomerNumberDetectionInWord.zip (325.9 KB)

@zpopswat,

I have managed to reproduce this behavior on my end too. What I understand is that RegexOptions.Multiline option influences a regex pattern rather than a text being processed. And you can learn more about this Multiline mode in the following page:

However, for the sake of any corrections in Aspose.Words’ API, we have logged this problem in our issue tracking system. Your ticket number is WORDSNET-22417 . We will further look into the details of this problem and will keep you updated here on the status. We apologize for any inconvenience.

1 Like

The issues you have found earlier (filed as WORDSNET-22417) have been fixed in this Aspose.Words for .NET 21.8 update and this Aspose.Words for Java 21.8 update.