Problems with calculating large files and UDFs

Hi, we are using Aspose Cells for .net for Azure function that is reading an excel file and calculating the formulas. We are having 10-15 UDFs that are registered - mostly for retrieving data form DB and after that some normal formulas that users create base on the Data of UDFs. All works fine except when we have some really large reports with a lot of Formulas. The moment when the we call calculate all formulas with the UDF options the process takes a lot of time sometime so long so we had to implement time out exception because the function looks like stuck. Also when we updated with one of the last versions of the Aspose library 21.4… I saw that ICustomFunction interface is deprecated and is going to be removed 1 year for August 2020 so these days. Is it possible this is to be the reason… or some implementation errors - leading to slow process or is there some kind of max number of functions/data so to prevent such large reports to be send.

I`m attaching one zip file with some code sample - it is hard to create something out of the whole abstraction of the project - so if something else is needed please let me know and the excel file is also in the zip file. The problem is in the calculating part - class ReportExcelEngine line 42 - it is called form the main method. The UDF in the excel is UmTSArrayFilter in this excel it is called 3 times with different parameters and a lot of index formulas, some graphics and so on. And this on is throwing a time out exception CodeAndExcelFile_Svetozar.zip (6.5 MB)

There report itself is generated with our excel addin. If we reduce the formulas the report will pass.

Thanks,
Svetozar

@Svetozar,
I have tried to compile your code but it could not be compiled because there is no constructor available for AppCustomFunctions that takes no argument as follows:

AppCustomFunctions customFunctions = new AppCustomFunctions();

Similarly following line is also not compiled.
var reportExcelEngine = new ReportExcelEngine(customFunctions, calculationContext);

Could you please share some console application that can be compiled and executed without any missing references?

I don’t think so that it is causing this issue. If it is compiled, then it should work normally.

Until and unless we don’t get some running sample code, it is not possible to share feedback. Similarly, there is no limit on formulas however it is obvious that if the number of formulas is large, it will take more time.

HI @ahsaniqbalsidiqui,

sorry for the late reply just needed some time to create a working console app that I hope describes my problem. I`m attaching it now. There are are 3 files what will start generating and the 3th one will freeze which is what the reason for the failing reports is.

Regards,
SvetozrCodeSnippet.zip (6.5 MB)

@Svetozar,
Thank you for sharing the sample project. We have observed the issue and logged it in our database for further investigation. You will be notified here once any update is ready for sharing.

This issue is logged as:
CELLSNETCORE-135 - Application halts while calculating large files and UDFs

@Svetozar,
It is the formula calculation of the workbook which takes too long time. In your template file “Actual_Failing_Report_Fake_Data.xlsx”, there are many formulas which requires comparison on large range of cells. For example, one cell’s formula contains the function like:

INDEX($D$35:$D$73474,MATCH(1,($H36=$B$35:$B$73474)*(I$34=INT($F$35:$F$73474)),0))

Here to calculate this cell, we need to compare H36 with other 73440 cells at column B and compare I34 with other 73440 cells at column F. So, for this one cell and only for part of its formula there are about 150000 comparisons to be performed. And in the template file there are large set of such kind of formulas.

We are still working on such kind of case to try to improve the performance. By our test, the calculation of the original workbook(by Workbook.CalculateFormula()) needs about 300 seconds. According to our preliminary research, we think it is possible for us to make the time decreased to about 100 seconds. However, we need more time to finish the task (about two weeks or a bit more).

By the way, the removement/replacement of ICustomFunction won’t cause performance issue when compare with old versions. In fact the new API of AbstractCalculationEngine is more flexible and convenient to be used. You do not need to use both of them together. Instead you may just move the calculation logic in implement of ICustomFunction to AbstractCalculationEngine and then use AbstractCalculationEngine only.

HI @ahsaniqbalsidiqui,

Thanks for the fast reply. Yes I understand the need of the time for the calculation - thats obvious. Just our main problem is that without the UDFs the reports are passing (with hardcoded data) but when the data is taken from the UDFs sometimes it reaches the 15 min time out exception per one calculation. Because we are using Azure function to calculate the reports and to deliver the reports to the clients I implemented a timeout exception logic if the calculation (_workbook.CalculateFormula(_calculationOptions)). So thats what I want to ask do we implement the UDFs wrong or just because the data is dynamic it takes a really long time. For example this file No_udfs.zip (6.4 MB) can pass for like 2-3 minutes per calculation but the other one witch is the same but just returns same fake data cannot - the failing file. If its possible Ill be happy to jump in a fast call to explain more the problem we experience,

Regards,
Svetozar

@Svetozar,
We have noted your feedback and will provide our comments after detailed analysis.

@Svetozar,

For the performance difference of calculating formulas with and without custom engine, it is because when there is no custom engine all formulas that use custom function will be calculated as “#NAME?”. For most of other formulas in the template file, comparing an error value to other operands will be much faster than comparing operands with normal values, especially when comparing to a large range of data. However, without the custom engine you cannot get the expected results for calculating formulas.

For your given template file, with our recent improvement for formula calculation (such as, our latest official release 21.9), you should get better performance now. Please try our latest version. Furthermore, if you can modify some of the formulas in the template file, we believe you can get even much better performance (by our test, with below modifications, the time cost will be only 10%-20% of that of the original formulas):

In the sheets IE_Jan, IE_Feb and IE_Mar, there are many formulas that contain comparison to INT($F$35:$F$73474), such as, the cell I36, J36, I37, J37, … For such data, user may add one column data to hold the int part of F column, such as, using G column:

cells["G35"].SetSharedFormula("=INT(F35)", 73440, 1);
Then change the comparison from INT($F$35:$F$73474) to $G$35:$G$73474:

cells[35, 8].SetArrayFormula("=...$I$34=$G$35:$G$73474)...",1,1);
cells[35, 9].SetArrayFormula("=...$J$34=$G$35:$G$73474)...",1,1);
...

Now calculate the workbook and you should get the same results with much better performance now.

The issues you have found earlier (filed as CELLSNETCORE-135) have been fixed in this update. This message was posted using Bugs notification tool by simon.zhao