Calculate formula on large excel data of more 140K rows not working and taking too much time to calculate formulas on excel using Aspose Cells

We are using paid version of Aspose 8.1.1 in our app, but whenever we try to Calculate formula on large excel data of more 140,000 rows of data and having 80 columns, Calculate formula not working and it is extremely slow processing and taking too much time to calculate formulas on excel cells and we are using Aspose Cells for excel spreadsheet to load data in it. By the I have also tried using latest version of Aspose.Cells 20.5 version using trial version I guess, but still it is extremely slow processing while calculating formulas on this huge excel data.
Please advise how to improve performance/speed up processing Calculate formula on large excel data of more 140,000 rows of data in excel. Thanks!

@sridharravva2020,

Thanks for providing us details.

Well, if you are calling Workbook.CalculateFormula only once you may try to disable formula calculation chain (see the document for your reference) and see if it makes any difference. If you still find the formula calculations is extremely slow, kindly zip your template file (containing those huge list of formulas) to provide us and paste sample code to reproduce the issue, we will check it soon.

@Amjad_Sahi:
We are using latest version Aspose.Cells_20.5 downloaded evaluation copy through Nuget Package and code is writtent in C#.NET.

I have tried both these 2 methods, as you can see below code, and have tried several options to speed up the processing of this large excel data and both methods are taking around 3 to 4 hours to process this large excel data in a worksheet that has 140,000 rows and 88 columns in it.
Please note that out of those 88 columns, only last 7 to 8 columns has formulas in it and rest of columns are plain-text columns. I couldn’t post excel due to PHI information in it.
Please advise any help towards this performance issue.

  1. Method-1:

// Set the CreateCalcChain as true
excelWorkbook.Settings.CreateCalcChain = false;
excelWorkbook.Settings.CalcMode = CalcModeType.Manual;
// excelWorkbook.Settings.CalcMode = CalcModeType.Automatic;
// excelWorkbook.Settings.CalcStackSize = 30;

//Iterate all worksheets
for (int i = 0; i < excelWorkbook.Worksheets.Count; i++)
{
var currentWorksheet = excelWorkbook.Worksheets[i];
if (currentWorksheet.Name.Equals(“BLRawDataEncounters”, StringComparison.OrdinalIgnoreCase))
{
int formulasCount = 0;
Cells cells = currentWorksheet.Cells;
int lastRow = Math.Max(0, cells.MaxDataRow);
int lastCol = Math.Max(0, cells.MaxDataColumn);
CalcModeType calcMode = excelWorkbook.Settings.CalcMode;
for (int rowIndex = 0; rowIndex < lastRow; rowIndex++)
{
for (int colIndex = 0; colIndex < lastCol; colIndex++)
{
string formula = cells[rowIndex, colIndex].Formula;
//if (calcMode == CalcModeType.Automatic && formula != null)
if (formula != null)
{
// Prepare cell to get calculated value
try
{
Cell cell = cells[rowIndex, colIndex];
cell.Calculate(true, null);
}
catch (Exception ex)
{
throw ex;
}
}
}
}

	//currentWorksheet.CalculateFormula(new CalculationOptions { IgnoreError = false, Recursive = false }, false);

	//currentWorksheet.CalculateFormula(true, false, null);

	//Cells cells = excelWorkbook.Worksheets[i].Cells;
	//CalculationOptions copts = new CalculationOptions();
	//copts.Recursive = false;
	//copts.

	////iterate all the cells.
	//for (IEnumerator ie = cells.GetEnumerator(); ie.MoveNext();)
	//{
	//    Cell cell = (Cell)ie.Current;
	//    if (cell.IsFormula && !string.IsNullOrEmpty(cell.Formula))
	//    {
	//        cell.Calculate(copts);
	//    }
	//}
}
else if (currentWorksheet.Name.Equals("AggrPatient", StringComparison.OrdinalIgnoreCase)
		|| (currentWorksheet.Name.Equals("AggrEncounter", StringComparison.OrdinalIgnoreCase)))
{
	//currentWorksheet.CalculateFormula(new CalculationOptions { IgnoreError = false, Recursive = false }, false);
	//currentWorksheet.CalculateFormula(true, false, null);
	currentWorksheet.CalculateFormula(false, true, null);
}                    

}

excelWorkbook.Save(storageFilepath);

  1. Method-2:

// excelWorkbook.Settings.CalcMode = CalcModeType.Manual;
// excelWorkbook.CalculateFormula(true, false, null);

// Set the CreateCalcChain as false
excelWorkbook.Settings.CreateCalcChain = false;
excelWorkbook.CalculateFormula();
excelWorkbook.Save(storageFilepath);

@sridharravva2020,
Thank you for providing the sample code but we are afraid that without the template file it is not possible to reproduce the issue and provide assistance. You may please set this thread to private and share the file here. Please note that at Aspose, strict policies are implemented about the user data security and no un-authorized person can access data by any means.

Otherwise if possible you can also modify the data to some random values for data security and share the file after compressing it. If zip file size is more than the allowed size for attachment, you may upload it to some public file sharing server and share the link and password here for our testing.

Here it is sample data for you to investigate. Thanks so much for the help!
https://1drv.ms/u/s!AuV5Hxi_6HIbmUWPVMveqcVsnz1u?e=Kgb2ub

@sridharravva2020,
Thank you for providing the sample data. We will analyze it and share our feedback soon.

@sridharravva2020,
We were able to observe the issue but we need to look into it more. We have logged the issue in our database for investigation and for a fix. Once, we will have some news for you, we will update you in this topic.

This issue has been logged as

CELLSNET-47387 - Calculate formula on large excel data of more than 140K rows not working

Hi @ahsaniqbalsidiqui,
Thanks for looking into this. Any updates on this issue? Please let me know if you have any updates on this issue at the earliest. Please note that as you must have noticed that in this excel data, only 9 columns has formulas in it and rest of other 79 columns does not have formulas in it.
We have total 144,000 rows and total 88 columns in this excel data. Currently, while processing this huge excel data in rows/cells of 144,000 rows, it is taking several hours like 12 to 13 hours to process this huge data. We are trying to process this huge data in less than 10 mins or 15 mins or so. Please let me know how to achieve this. Thanks!

@sridharravva2020,
As we just logged the ticket, so could you spare us a little time (3-5 days or so) for complete evaluation and investigation for your issue. Once we have an update on it, we will let you know here.

@sridharravva2020,

We evaluated your issue further.
For every formula(cell) in your template file, the function needs to compare most of cells in one specified column, e.g D to certain value. These are quite time consuming functions, SUMIF,COUNTIF,MATCH, …etc with large data set. To improve the performance, you may change the formulas in the template file a bit by using shared formula feature and caching calculated results of some functions. For this specified template file, the change may be like following:
e.g
Sample code:

   cells["CC2"].SetSharedFormula("=IF(COUNTIF(D$1:D2,D2)=1,D2,\"\")", 50571, 1);
                    cells["CL2"].SetSharedFormula("=MATCH(CC2,$D:$D,0)", 50571, 1);
                    cells["CD2"].SetSharedFormula("=IFERROR(INDEX($H:$H,CL2),\"\")", 50571, 1);
                    cells["CE2"].SetSharedFormula("=IFERROR(INDEX($J:$J,CL2),\"\")", 50571, 1);
                    cells["CF2"].SetSharedFormula("=IFERROR(INDEX($K:$K,CL2),\"\")", 50571, 1);
                    cells["CG2"].SetSharedFormula("=IFERROR(INDEX($L:$L,CL2),\"\")", 50571, 1);
                    cells["CH2"].SetSharedFormula("=IFERROR(INDEX($M:$M,CL2),\"\")", 50571, 1);
                    cells["CI2"].SetSharedFormula("=IFERROR(INDEX($AK:$AK,CL2),\"\")", 50571, 1);
                    cells["CJ2"].SetSharedFormula("=IFERROR(INDEX($AL:$AL,CL2),\"\")", 50571, 1);
                    cells["CK2"].SetSharedFormula("=IFERROR(INDEX($AM:$AM,CL2),\"\")", 50571, 1);

In this way, the performance will be improved significantly.

Let us know if you still find any significant performance issue.

Thanks, sure, I will give it a try shortly. So basically, I will need to remove the formulas from excel template and then put those above mentioned shared formulas in C# code to calculate formulas for each of these cell/columns from CC to CK columns. Please correct me if I am mistaken. Thanks again!

@sridharravva2020,

Yes, your understanding is correct.

@Amjad_Sahi, Quick question here reg. setting shared formulas:
Why you have this rowNumber as “50571” and columnNumber as “1”, are we setting this only once all rows of excel?
Please let me know if we need to set these below shared formulas only once for first row and first column of excel sheet or something ? or do we need to these shared formulas for each row in the
for-loop? Please clarify the same at the earliest with some sample code in it. Thanks!

cells["CC2"].SetSharedFormula("=IF(COUNTIF(D$1:D2,D2)=1,D2,\"\")", 50571, 1);
                    cells["CL2"].SetSharedFormula("=MATCH(CC2,$D:$D,0)", 50571, 1);
                    cells["CD2"].SetSharedFormula("=IFERROR(INDEX($H:$H,CL2),\"\")", 50571, 1);
                    cells["CE2"].SetSharedFormula("=IFERROR(INDEX($J:$J,CL2),\"\")", 50571, 1);
                    cells["CF2"].SetSharedFormula("=IFERROR(INDEX($K:$K,CL2),\"\")", 50571, 1);
                    cells["CG2"].SetSharedFormula("=IFERROR(INDEX($L:$L,CL2),\"\")", 50571, 1);
                    cells["CH2"].SetSharedFormula("=IFERROR(INDEX($M:$M,CL2),\"\")", 50571, 1);
                    cells["CI2"].SetSharedFormula("=IFERROR(INDEX($AK:$AK,CL2),\"\")", 50571, 1);
                    cells["CJ2"].SetSharedFormula("=IFERROR(INDEX($AL:$AL,CL2),\"\")", 50571, 1);
                    cells["CK2"].SetSharedFormula("=IFERROR(INDEX($AM:$AM,CL2),\"\")", 50571, 1);

@sridharravva2020,

We recommend you to kindly see the document with example for your reference on Shared Formula feature:

(see the overloaded Cell.SetSharedFormula method/API reference)

As we suggested you got to use SharedFormula feature which will give great performance for your needs. Please notice the second and third parameter. Here “50571” denotes the total number of rows to be filled (automatically) with formula(s) accordingly. Also, “1” means total number of column(s) to be populated. Let us talk about the first line of code, i.e.,
cells["CC2"].SetSharedFormula("=IF(COUNTIF(D$1:D2,D2)=1,D2,\"\")", 50571, 1);
Since the last parader is “1”, so all the formulas are filled in the same “CC” column cells upto row # “CC50572” (since we started from 2nd row of “CC” ). In short, this line will set formulas automatically in the range CC2:CC50572 in such a way that in each formula row indices would be changed accordingly.

Yes, only one time operation, you will specify the shared formula for the first cell in the range and it will automatically fill all the corresponding cells (based on the Cell.SetSharedFormula second and third parameters). See above explanation for better understanding.

Let us know if you still have any confusion or issue.

Thanks for clarifying it, I have tried below code but it not printing any of the calculated values to those excel rows for this first tab-1, which had 144K records. Also, tab-2 in the same excel had some 400 records only with bunch of several formulas referring to columns from first tab-1, now this formulas calculation for tab-2 which had some complex formulas now taking too much time now to process like more than 2 hours to process, please help in resolving both these 2 issues at the earliest. Thanks so much!

Here it is the sample code that I have tried earlier for your reference, please let me know if I did any mistake in this code. Thanks!

				// Set the CreateCalcChain as false
                excelWorkbook.Settings.CreateCalcChain = false;
                
				//Iterate all worksheets
                for (int i = 0; i < excelWorkbook.Worksheets.Count; i++)
                {
                    var currentWorksheet = excelWorkbook.Worksheets[i];
                    if (currentWorksheet.Name.Equals("Tab-1", StringComparison.OrdinalIgnoreCase))
                    {
                        //int formulasCount = 0;
                        Cells cells = currentWorksheet.Cells;
                        //cells.MemorySetting = MemorySetting.MemoryPreference;
                        int lastRow = Math.Max(0, cells.MaxDataRow);
                        
                        cells["CC2"].SetSharedFormula("=IF(COUNTIF(D$1:D2,D2)=1,D2,\"\")", lastRow, 1);
                        cells["CD2"].SetSharedFormula("=IF($CC2=\"\", \"\", H2)", lastRow, 1);
                        cells["CE2"].SetSharedFormula("=IF($CC2=\"\", \"\", J2)", lastRow, 1);
                        cells["CF2"].SetSharedFormula("=IF($CC2=\"\", \"\", K2)", lastRow, 1);
                        cells["CG2"].SetSharedFormula("=IF($CC2=\"\", \"\", L2)", lastRow, 1);
                        cells["CH2"].SetSharedFormula("=IF($CC2=\"\", \"\", M2)", lastRow, 1);
                        cells["CI2"].SetSharedFormula("=IF($CC2=\"\", \"\", AK2)", lastRow, 1);
                        cells["CJ2"].SetSharedFormula("=IF($CC2=\"\", \"\", AL2)", lastRow, 1);
                        cells["CK2"].SetSharedFormula("=IF($CC2=\"\", \"\", AM2)", lastRow, 1);

                        //Save excel file here to perform calculation on above shared formulas
                        excelWorkbook.Save(storageFilepath);
                        
                    }                        
                    else if (currentWorksheet.Name.Equals("Tab-2", StringComparison.OrdinalIgnoreCase))
                    {                            
                        Cells cells = currentWorksheet.Cells;
                        int lastRow = Math.Max(0, cells.MaxDataRow);
                        int lastCol = Math.Max(0, cells.MaxDataColumn);
                        
                        CalculationOptions copts = new CalculationOptions();
                        copts.Recursive = true;

                        for (int rowIndex = 0; rowIndex < lastRow; rowIndex++)
                        {
                            for (int colIndex = 0; colIndex <= lastCol; colIndex++)
                            {
                                Cell cell = cells[rowIndex, colIndex];
                                //if (calcMode == CalcModeType.Automatic && formula != null)
                                if (cell.Formula != null)
                                {
                                    // Prepare cell to get calculated value
                                    //cell.Calculate(false, null);
                                    cell.Calculate(copts);
                                }
                            }
                        }

                    }
                }

                excelWorkbook.Save(storageFilepath);
            }

@sridharravva2020,

What do you mean, do you want to print the calculated values against formulas? Where you calculate formulas in code for the first worksheet? You got to call e.g Workbook.CalculateFormula to calculate all the formulas in the spreadsheet first then you may get calculated results.

This is not good way to iterate cells in for loop as it will be slowing down for sure, if you loop through cells by indices or access cells randomly, its performance may be much slower. However, if you could loop through cells by IEnumerator that is got from Cells, the performance should be better.

Also, we need to evaluate your scenario and code segment in details. We will do and get back to you with more recommendations or updates soon.

@Amjad_Sahi, Thanks for letting me know, Please keep me posted with further updates on this.
However, I have tried the suggestions that you have mentioned above, here it is the updated sample code that I wrote, but it is still taking very long time to process the entire excel around 80 to 90 mins total time for calculating several complex formulas in this excel.
Please note that this excel has 2 tabs (worksheet-1 with 144K records containing formulas in last 9 columns CC to CK and worksheet-2 with 450 records containing complex formulas referring columns from worksheet-1/tab-1).

                // Set the CreateCalcChain as true
                excelWorkbook.Settings.CreateCalcChain = false;
                
                //Iterate all worksheets
                for (int i = 0; i < excelWorkbook.Worksheets.Count; i++)
                {
                    var currentWorksheet = excelWorkbook.Worksheets[i];
                    if (currentWorksheet.Name.Equals("Worksheet-1", StringComparison.OrdinalIgnoreCase))
                    {
                        //int formulasCount = 0;
                        Cells cells = currentWorksheet.Cells;
                        //cells.MemorySetting = MemorySetting.MemoryPreference;
                        int lastRow = Math.Max(0, cells.MaxDataRow);
                        //int lastCol = Math.Max(0, cells.MaxDataColumn);                            

                        //CalculationOptions copts = new CalculationOptions();
                        //copts.Recursive = false;

                        cells["CC2"].SetSharedFormula("=IF(COUNTIF(D$1:D2,D2)=1,D2,\"\")", lastRow, 1);
                        cells["CD2"].SetSharedFormula("=IF($CC2=\"\", \"\", H2)", lastRow, 1);
                        cells["CE2"].SetSharedFormula("=IF($CC2=\"\", \"\", J2)", lastRow, 1);
                        cells["CF2"].SetSharedFormula("=IF($CC2=\"\", \"\", K2)", lastRow, 1);
                        cells["CG2"].SetSharedFormula("=IF($CC2=\"\", \"\", L2)", lastRow, 1);
                        cells["CH2"].SetSharedFormula("=IF($CC2=\"\", \"\", M2)", lastRow, 1);
                        cells["CI2"].SetSharedFormula("=IF($CC2=\"\", \"\", AK2)", lastRow, 1);
                        cells["CJ2"].SetSharedFormula("=IF($CC2=\"\", \"\", AL2)", lastRow, 1);
                        cells["CK2"].SetSharedFormula("=IF($CC2=\"\", \"\", AM2)", lastRow, 1);

                        currentWorksheet.CalculateFormula(true, false, null);                    
                    }                        
                    else if (currentWorksheet.Name.Equals("Worksheet-2", StringComparison.OrdinalIgnoreCase))
                    {
                        //currentWorksheet.CalculateFormula(true, false, null); --USED this lastly
                        
                        Cells cells = currentWorksheet.Cells;
                        int lastRow = Math.Max(0, cells.MaxDataRow);
                        int lastCol = Math.Max(0, cells.MaxDataColumn);
                        //CalcModeType calcMode = excelWorkbook.Settings.CalcMode;

                        CalculationOptions copts = new CalculationOptions();
                        copts.Recursive = false;

                        //iterate all the cells.
                        for (IEnumerator ie = cells.GetEnumerator(); ie.MoveNext();)
                        {
                            Cell cell = (Cell)ie.Current;
                            //if (cell.IsFormula && !string.IsNullOrEmpty(cell.Formula))
                            if (cell.Formula != null)
                            {
                                cell.Calculate(copts);
                            }
                        }

                    }
                }

                //Save excel file finally here after performing all excel formulas calculations on each individual worksheet.
                excelWorkbook.Save(storageFilepath);

Well, calculating formulas cell by cell will slow down the performance significantly for sure. If you do not need to calculate those formulas one by one, and the formulas do not reference to other formulas which have not been calculated, then you should use CalculationOptions.Recursive to set it to false.

Also, you said that those formulas in the second sheet references to the cells in the first sheet. By reviewing your code, the first sheet should have been calculated before the second sheet, so it should be ok to use this option. We tested your scenario/ case a bit too but could not find any significant performance issue.

I am afraid, if you still find performance issue apart from all recommendations we suggested, we need a standalone sample console application to reproduce the issue. Please provide us the sample project with template files, we will check it soon.

Reg <<you said that those formulas in the second sheet references to the cells in the first sheet. By reviewing your code, the first sheet should have been calculated before the second sheet>>,
Yes, the first worksheet is calculating first and then the second worksheet will be calculated after that.
I am working on some code changes and using sharedformula method, but it is still taking 70 to 80 mins.
Here it is the sample console project that you can use to work on it.
https://1drv.ms/u/s!AuV5Hxi_6HIbmUaAJv5MYr0HFy2G?e=yzUHVE

In the mean-time, can you please let me know any update on this logged ticket that you have it. << CELLSNET-47387 - Calculate formula on large excel data of more than 140K rows not working>>
We are looking for possible suggestions that I can significantly improve performance of this excel worksheet containing 144K records and bunch of complex formulas in it and trying to reduce the time to 20 to 30 mins. Please provide us some more ideas/suggestions that we can try it out to achieve this performance improvement.
Thanks so much!

@sridharravva2020,

Thanks for the samples.

I think using 70-80 mins time (taken) is not really bad for those bunch of formulas when following the shared formula feature (we suggested) considering the fact that previously 4 hours were being taken to complete the process. I have logged your resource files (sample project and template file) against the ticket for investigations. This will help to analyze your issue (logged earlier as “CELLSNET-47387”) thoroughly. Let us evaluate your code segment in details and we will check if your code needs some tweaks to minimize the time cost as per your requirements.

Once we have an update on it, we will let you know.