Performance testing( Huge Excel documents.)

Hi,


We are currently running performance testing for our app. Our app is using huge excel documents which needs to be loaded into memory and fed values and saved as output excelsheets. Excel documents has lots of data and lots of formula’s.

Our shows the test shows that majority of time is spend on opening the excel and calculating formula’s. I can attach some CPU hotspots with post. can you please suggest if there is any optimization can be done from our coding /configuration prespective.

Our App basically has 9 such excel sheet documents. Async request from the web app is fed to the spreadsheet and excel sheet does some calculation whose result is stored to the database.
Each excel-sheet ranges around 3-12 MB file.

Our JVM is configured to run 8 threads and runs with 2GB Memory setting. Is there any caching that you might suggest instead of opening the excel sheet for each request can these excelsheet Worksheet Objects be cached and clone in memory and used by each request.

1) Right now we dont cache any worksheet in memory. Will caching help.
2) Calculate formula method seems to take 37% of total processing any suggestion on improving the performance.

Code to open:
fs = new FileInputStream(file);
workbook = new Workbook(fs);
FileFormatInfo fileFormat = FileFormatUtil.detectFileFormat(file.getAbsolutePath());
passwordProtected = fileFormat.isEncrypted();
extension = FileFormatUtil.loadFormatToExtension(fileFormat.getLoadFormat());
//Set the Formula Calculation Mode to AUTOMATIC.
workbook.getSettings().setCalcMode(CalcModeType.AUTOMATIC);

Code to calculate.
workbook.calculateFormula();

Aspose version:aspose-cells-8.2.1.4. Will upgrading to latest jar help.

Hi Sundarraj,


Thank you for contacting Aspose support.

It would be of great help if you can share a few samples from your end after bench-marking the time to load the spreadsheet and to calculate the formulas in your environment. Please also provide the environment details as follow so we could try to simulate it (as much as possible) to re-run the same tests on our side while using the latest version of Aspose.Cells for Java 8.5.2.4 (attached).

  • Operating system version & architecture
  • Locale/Regional setting of the machine
  • JDK vendor & version
  • JVM arguments (if any)
  • Complete source code (preferably a sample console application)
  • Sample spreadsheets for testing

You may consider using the Memory Preferences offered by Aspose.Cells APIs. While enabling the aforesaid settings, you will be able to load the file in an instance of Workbook with less memory usage therefore the process would be a little quicker. Please note, the latest revisions of Aspose APIs are always better than their predecessors in terms of performance as well as the feature set therefore if you upgrade the API to latest revision, I think you will be able to notice the performance improvements to some extent.

Hi Raza

Following in the architecture and CPU configuration we running against. Due to company policy we cant provide you the actual sheet. but the code is pretty much what is mentioned in my post. All we are doing is opening excel files and inserting certain data to that and forcing a calculation. and saving the results. let me know if we can schedule a call to provide additional information needed.

Architecture x86_64
CPU op-mode(s) 32-bit, 64-bit
CPU(s): 6
java version 1.7.0_79(OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode))

6
GNU/Linux|

Also if you can let us know any benchmark available processing 15 MB excel files from your end.

Hi Sundarraj,


Thank you for sharing the environment details, however, we will not be able to properly evaluate the presented scenario unless you share a few samples with us. Please note that the size of the spreadsheet does not suggests that it has complex formulas in it because the size of the spreadsheet file mainly depends on the contents (textual/graphics/shapes/objects) as well as the formatting and styling. Moreover, the time required to calculate all formulas in a spreadsheet varies depending on the complexity and inter-dependencies of the formulas therefore we will not be able to recreate a spreadsheet that matches your sample without reviewing the actual sample.

That said, please review the EULA for NDA (clause 8) and pass it to your legal team as well. Please note, we are bound not to publicly expose any sensitive information shared in a thread marked as Private whereas the Public threads are exposed to search engines and all Aspose community. I have marked this thread Private so you can share the samples now.

By the way, have you give a try to the latest version as suggested in my previous response? Also please check the performance while enabling the Memory Preferences (documentation link shared earlier) and see if you can observe any improvement.

Sundarrajaspose:
Also if you can let us know any benchmark available processing 15 MB excel files from your end.

I am afraid, we currently do not have performance matrix as required. Moreover, as discussed above, the file size does not suggests that the spreadsheet has complex formulas, that is the reason we asked you to share a sample spreadsheet after bench-marking the performance on your side. We will then recreate the similar environment on our end to test the performance and tweak the process (if possible) to get optimal results and/or log the concerns for the product team's review to plan improvements with future releases of Aspose.Cells for Java API (if required).

Hi Babar,


File size is due to number of sheets and data inside the sheet. I wasn’t suggesting the size was because of the formula. Its our company policy we cant share the actual spreadsheets which are very complex to create a sample. if needed we can have a working session to explain the issues that we are trying to address.

We need a technical suggestion on the following
1) how to reduce the load time (Instead of loading the same sheet again and again can we do something in memory to clone the Worksheet Object)
2) Best practices to reduce the calculation time.

We are working on using the latest aspose jars. currently running in to issues there are some calculation difference between the existing version aspose-cells-8.2.1.4 and aspose-cells-8.5.2.4. Trying to figure out which formula is getting calculated wrongly. We had issues before with the api and got it fixed through a patch. can you please confirm if the patch is included in the latest version.

Here is link for the old ticket.

SUMPRODUCT with Indirect not working - #11 by Vinodkar87 - Free Support Forum - aspose.com

Thank you for the quick responses.

Hi Sundarraj,

Sundarrajaspose:
File size is due to number of sheets and data inside the sheet. I wasn't suggesting the size was because of the formula. Its our company policy we cant share the actual spreadsheets which are very complex to create a sample. if needed we can have a working session to explain the issues that we are trying to address.

If the sample spreadsheet comprises of merely data then it will surely take some time to load the complete structure/data model into the memory. One possible solution to reduce the loading time is the LightCells API however, you will not be able to perform operations such as formula calculations while using that approach.

Please note, Aspose support is provided only through the support forums and live chat so we cannot have live sessions as per company policy. Moreover, we require the samples to replicate the issue before logging it for the product team and further to evaluate the problem cause. It will not be possible for us to provide a fix for an issue whose cause is unknown therefore I would again suggest you to involve your legal team in this matter so that you could be allowed to share the problematic samples.

Sundarrajaspose:
We need a technical suggestion on the following
1) how to reduce the load time (Instead of loading the same sheet again and again can we do something in memory to clone the Worksheet Object)
2) Best practices to reduce the calculation time.

As discussed earlier, one solution to reduce the loading time is to use the LightCells APIs but it has its own limitations therefore not feasible for your requirements. However, while working on single spreadsheet, you can surely create & keep it's copy in the memory at all times and operate on the copy rather than the original object, similarly, while working on single worksheet, you can copy it to a new object of Workbook and use it repeatedly. In reference to the calculation time, if you wish to calculate all formulas at once then I am afraid there are no other ways, however, if formulas have to be calculated several times due to the changes in the cell values then you can use the WorkbookSettings.CreateCalcChain property to lower the calculation time to some extent.

Sundarrajaspose:
We are working on using the latest aspose jars. currently running in to issues there are some calculation difference between the existing version aspose-cells-8.2.1.4 and aspose-cells-8.5.2.4. Trying to figure out which formula is getting calculated wrongly. We had issues before with the api and got it fixed through a patch. can you please confirm if the patch is included in the latest version.

Here is link for the old ticket.

http://www.aspose.com/community/forums/577525/sumproduct-with-indirect-not-working/showthread.aspx#577525

If you find a difference in the calculation results of version 8.2.1.4 and the latest then please share the sample spreadsheet(s) for proper analysis. Regarding the tickets attached to the mentioned thread, they were all fixed with the release of Aspose.Cells for Java 8.2.2 and the fixes are also available with latest build 8.5.2.
Thanks for your suggestion, once we get a fix for the below mentioned issue , i will do a performance test again and will keep you posted.

https://forum.aspose.com/t/39434

Hi Sundarraj,


I have checked the status of the ticket CELLSJAVA-41487 attached to your other thread. Currently the ticket is in analysis phase so as soon as we complete the preliminary investigation, we will share the results with you on the relevant thread.