Performance and Memory related questions

Hi,

We are using Aspose Cells for Java. I have the following questions with respect to Performance and Memory:

a) I have n worksheets containing data. The n+1th worksheet is used to hold the aggregated sum of the data in the n worksheets. e.g. Sheet1!A1+ Sheet2!A1 ... + Sheetn!A1. This value can also be calculated by writing a formula on the n+1th worksheet which uses local sheet cells for doing the calculation. When I perform calculations at a sheet level, which one is faster? Scenario1 (Sheet1!A1.... Sheetn!An) or the Scenario 2 where I use formulas at the worksheet level.

b) We construct worksheets in the memory and then write it to the disk. Currently our business users have not asked us for any fancy borders or font styles on any of the text in the worksheet. Will it be memory efficient to use very less formatting on the sheet or is it going to make very less difference.

c) This question is more related to formula calculation. I have left first two columns vacant for readability. My tests indicated that when I delete these two columns and any unnecessary blank cells, the formula re-calculation is faster. Also the memory required is less. Can you please confirm that removing unnecessary blank cells from the Worksheet actually helps?

Can you please provide any performance tips that would help me speed up my application?

Thanks,
Shashi

Note: Currently the calculation/re-calculation is a bit slow when we have close to 120 sheets of data.

Hi Shashi,

For your questions, I try to explain some:


a). I am not very clear about your two scenarios, do you mean in scenario 1 you calculate the aggregated value manually in program and set it to cells of last sheet, and in scenario 2 you define aggregated formulas for cells of last sheet? If so, I think scenario 1 will be more efficient, but there is one notable thing for it:

if there are many vacant cells, please use Cells.checkCell(int row, int col) rather than Cells.getCell(int row, int col) to get the cell and value in your calculating loop, if the cell is vacant and has not been initialized, checkCell will return null, but the getCell will initialize a new Cell object for it, that would be an unnecessary memory burden. And for vacant cells we will discuss some more in reply for your third question.


b). Yes, more styles will cause more memory and disk requirement. But if your styles are same for many cells, I think you can define a Style object, and set it to those cells by Cell.setStyle() method. This sharing of one Style object for many cells will save memory efficiently.


c). For vacant cells, if they are not formulas, I don't think they will affect performance heavily when calculate formula. If those cells have been initialized and even been set styles, they do cause more memory requirement. If those cells are meaningless for you in program and only for GUI view purpose, I think you can avoid to call methods such as Cells.getCell(), Columns.getColumn() for those cells and columns, because these methods will initialize and save those unnecessary objects.

Hi Johnson,

For a) Thanks for the quick response. I have uploaded two Excel files with examples for Scenario1 and Scenario2. In Scenario1, please refer to the aggregate sheet where I have the orange colored row containing the formula. It is a sum of values from different sheets. In Scenario 2, I have it as the sum from the same sheet. Both the scenarios give me same result. My question is which scenario is faster, when I call the worksheet.calculateFormula(boolean, boolean) method on the “Aggregate” worksheet. Does it depend on the no of sheets being aggregated v/s no of rows being summed or any other variables?

For b) We use pre-defined Excel templates with some formatting done for improving the readability of the template. This template is loaded in memory when the user tries to view the data. Also the data is filled from backend database after applying a lot of business rules. Since this template is “cloned” by us using the Worksheet.copy/workbook.copy methods it will obviously contain the styles/formatting. This formatting is not seen by the user until he exports the excel sheet. Hence my question is, will it help to remove any formatting from the templates that are being cloned in memory?

For c) If I understand your response correctly, no matter how many cells are present in the worksheet, Aspose Cells is intelligent enough to find only those cells having formulas without looping through the blank/non-formula cells. Please confirm if this is correct.

Thanks,
Shashi

Hi Shashi,

For a). First, the performance depends on formulas need to be calculated recursively; then the count of cells being referred to by the formula affects performance a bit too. For your case, if your "recursive" flag of invoking Worksheet.calculateFormula(boolean, boolean) is false, Scenario 2 is almost same with Scenario 1; If the flag is true, Scenario2 is faster than Scenario 1. Commonly I think you would better prefer Scenario2 to Scenario1. And if the whole workbook need to be calculated, using Workbook.calculateFormula() is better than invoking Worksheet.calculateFormula() separately.


For b). Removing those formatting will help to save some memory. By the way, if all sheets need to be copied, using Workbook.copy() is better than invoking Worksheet.copy() separately.


3. In fact, Aspose Cells need to traverse all cells to check whether one cell is formula, if true, will calculate it. But checking a cell being formula is much faster than parsing and calculating a formula. If the blank/non-formula cells affect performance obviously in your case, please give us a sample template to show this issue, we will check it. For your templates Scenario1 and Scenario 2, I have tested and not found any difference when remove the blank columns and rows or not.

Hi Johnson,

Thanks for the reponse. I have sent the file to email id: nanjing at aspose dot com due to its confidential nature. Please delete it after your analysis is done.

Scenario: Try deleting the first 2 columns and the blank rows at the top in each of the sheets. I deleted it in around 15 worksheets and found an improvement of around 5 secs. in workbook.calculateformula method.

Please let me know your thoughts.

Thanks,
Shashi

Hi Johnson,

Were you able to look at the file that I sent across?

Regards,
Shashi

Hi Shashi,

I have tested with your file but can't find the issue, I used the Workbook.calculateFormula(), that took about 2.5 sec for both situation, I remove the blank columns A,B,C and rows 1-20 for 15 sheets too. Would you please give us your sample code?