Performance issues with Worksheet.Cells.Find with .NET framework (compared to .NET)

Product: Aspose.Cells
Version: 25.1.2

CPU: i5-1335U (10 Cores, 12M Cache, up to 4.60 GHz)
HDD: Samsung SSD 990 PRO 1TB
OS: Windows 11

Our customers are reporting issues with one of our features being very slow. In this feature we are using Worksheet.Cells.Find along with a complex regex to look for tags in spreadsheet.

I went ahead and created a simple application in .NET Framework 4.8 (which is the framework that we are using in our product). I measured the time it took to run Worksheet.Cells.Find with a complex regex on a large spreadsheet filled with dummy text. I was able to confirm that the performance was poor.

I then created a simple application in .NET 8. When running the exact same code I noticed that it was a huge improvement in performance (about 30x faster in my test).

It would mean a lot to us and our customers if the performance of Worksheet.Cells.Find in .NET Framework 4.8 would be as fast as in .NET 8.

Sample (solution with two different projects and a sample.xlsx):
AsposeCellFinder.7z (13.5 KB)

Also, I noticed that this was reported in 2016 ( Cells.find doesn’t worl properly - Free Support Forum - aspose.com), but it doesn’t appear to be any fix yet.

@lars.olsson
It seems the issue of Regex between different net frame.
We will check it soon and share what we found later.

@lars.olsson,

After initial testing, we reproduced the issue as you mentioned by using your sample project with template Excel file. I found performance issue with Worksheet.Cells.Find (having Regex) with .NET framework 4.x (compared to .NET (8.0) Core).

We require thorough evaluation of the issue. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-57770

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

We did a test where we replaced Worksheet.Cells.Find (having Regex) in .NET Framework 4.x with a for-loop that checks each indivudal cell against the provided Regex. This provided better performance compared to Worksheet.Cells.Find. Could this be a potential workaround or may I be missing out on something? Also, it makes me curious about what’s going on inside Worksheet.Cells.Find except for matching regexes?

// Possible workaround?
foreach (Cell cell in Worksheet.Cells) {
    if (regex.IsMatch(cell.StringValue)) {
        matches.Add(cell);
    }
}

//Cell cell = null;
//var findOptions = new FindOptions { RegexKey = true };
//while(cell = Worksheet.Cells.Find(regex.ToString(), cell , findOptions)) {
//    matches.Add(cell);
//}

@lars.olsson
Thanks for further details. Let us investigate and analyze your issue in details. We will get back to you soon.

@lars.olsson

We have investigated the issue thoroughly and found the cause of it(comparing with the simple match provided in your last post). When specifing the search key, the key will be refactored according to FindOptions.LookAtType to ensure it can be matched according to the expected type. By default LookAtType’s value is “Contains”, then the search key will be refactored by adding wildcards at the beginning and end of it. Such case makes the regex more complicated and the performance decrease. Because you have specified the exact rule for the provided regex, there is no need to use the additional conditions of LookAtType, so please set it as EntireContent and then you can get better performance:

...
            worksheet.Cells.Find(
...
                findOptions: new FindOptions { RegexKey = true, LookAtType = LookAtType.EntireContent });
...