Aspose Cells Performance Issues with Cell-By-Cell Read/Write

We're finding that with output to and input from spreadsheet cell by cell instead of a column at a time, performance drops precipitously (around 10 times slower), but there's no other method available for reading and outputting rich-text formatting from/to a spreadsheet. We're wondering whether there's any room for optimizing performance on the cell-by-cell / rich-text functionality? Not sure how much effort has already been put into this at Aspose, but need to ask, as it may make the tool un-viable for our needs due to its slow performance, (especially concerning for us on the output of data to Excel).

Hi,

Thanks for your inquiry.

Could give us more details how do you get/set cell by cell values? And, provide the information, from how many cells you are reading/writing rich text formatting? We appreciate if you could create a sample application to show the issue regarding performance. We will check your code more closely and it may help us how we can sort out the things for your requirement soon.

By the way, did you try our latest version ( 4.7.1: http://www.aspose.com/community/files/51/file-format-components/aspose.cells-for-.net-and-java/entry174461.aspx ) if it makes a difference?

Thank you.

On further investigation, the problem seems to be more fundamental than cell-by-cell formatting. Even when we do column by column or whole-table-at-a-time writes to Excel, even without any rich text formatting, we are experiencing prohibitively slow performance and many Out-of-Memory errors. When working a column at a time, (which is what we need to be doing for most columns), we are using the command

worksheet.Cells.ImportDataColumn(dt, true, 0, columnPosition, dt.Columns[columnInfo.ColumnNameInDataStore].Ordinal, false);

We have tried this on the latest version of your software, and receive roughly the same results.

Here are some numbers. These were recorded using Stopwatches in the code, so they reflect ONLY the formatting of a Worksheet object in memory, NOT the selection of data from our database, NOR the call to Worksheet.Save, etc.:

For downloading 40 columns by 10,000 rows to Excel (400,000 data points), simply taking a DataTable and pumping the data through Aspose to Excel (no data modification of any kind):

Imported Column # 0 in 12 ms

Imported Column # 1 in 71 ms

Imported Column # 2 in 197 ms

Imported Column # 3 in 264 ms

Imported Column # 4 in 348 ms

Imported Column # 5 in 490 ms

Imported Column # 6 in 525 ms

Imported Column # 7 in 667 ms

Imported Column # 8 in 692 ms

Imported Column # 9 in 831 ms

Imported Column # 10 in 889 ms

Imported Column # 11 in 999 ms

Imported Column # 12 in 1191 ms

Imported Column # 13 in 1204 ms

Imported Column # 14 in 1263 ms

Imported Column # 15 in 1344 ms

Imported Column # 16 in 1585 ms

Imported Column # 17 in 1567 ms

Imported Column # 18 in 1629 ms

Imported Column # 19 in 1698 ms

Imported Column # 20 in 1857 ms

Imported Column # 21 in 1898 ms

Imported Column # 22 in 2007 ms

Imported Column # 23 in 2064 ms

Imported Column # 24 in 2183 ms

Imported Column # 25 in 2226 ms

Imported Column # 26 in 2360 ms

Imported Column # 27 in 2420 ms

Imported Column # 28 in 2563 ms

Imported Column # 29 in 2636 ms

Imported Column # 30 in 2716 ms

Imported Column # 31 in 2791 ms

Imported Column # 32 in 3125 ms

Imported Column # 33 in 2965 ms

Imported Column # 34 in 3080 ms

Imported Column # 35 in 3183 ms

Imported Column # 36 in 3240 ms

Imported Column # 37 in 3333 ms

Imported Column # 38 in 3479 ms

Imported Column # 39 in 3525 ms

Total Execution Time: 100527 ms (1:41 minutes)

Number of columns: 40

Number of rows: 9999

Time assigning data via Aspose: 71139 ms (1:11 minutes)

Time to Aspose AutoFit columns: 12894 ms (13 seconds)

Time to Aspose Autofit rows: 14526 ms (15 seconds)

What can be seen in this data is:

  1. Every subsequent column that is added to the Excel spreadsheet takes longer than the previous one. The 40th column takes 293x longer to import than the 1st column

  2. 27 seconds (25% of the time) is spent fitting the columns and rows after the data is imported

  3. These times are almost purely from the Aspose code. If you add up all the timers, the remaining code of ours is only taking 2 seconds of the 1:41 minutes of CPU time.

  4. We frequently get OutOfMemory errors running this code. We believe that the errors are due to memory fragmentation and not a lack of RAM, as we have 3GB of RAM (the max usable by Windows XP, which is our operating system) on our machines.

a. Running Microsoft DebugDiag tool, we can see that after running the Excel download we have 82% free memory fragmentation

Also, FYI in case you may find it useful, here’s a blurb from the internet explaining memory fragmentation:

Since applications assume memory address is contiguous, that 8K allocation CANNOT use any of the 4K of unallocated memory. That 4K is technically “free memory”, but if the application needs >4K of memory, it is not usable by it. Now, imagine this happening all over the virtual memory address space to the point that no big chunk of CONTIGUOUS memory remains, and then you ask for a big block of memory… this is when you get Out of Memory even though enough fragmented free memory exists.

Here’s a simplistic “worst case” example: assuming you manage to allocate the first byte of every virtual memory page (let’s say the page size is 4096 bytes) and the virtual memory address space size is 4GB, you will see only 1MB of physical RAM used but all of the virtual memory used… and you get Out of Memory on the next allocation. In other words, over 99.99% memory is “free” yet you get “Out of Memory”. Crazy, right? But that’s memory fragmentation for you…

Thank you.

Hi,

Thank you for considering Aspose.

Currently, we store all cells of the worksheet into a list order by row and column. E.g., if the existing cells are A1 and A2 and you want to input value to the cell B1, we will insert a cell at the index 1. So the order of the cells will become A1, B1 , A2. We are working on changing this cell storage model but it could not be available soon.

RiskSWDeveloper:

  1. Every subsequent column that is added to the Excel spreadsheet takes longer than the previous one. The 40th column takes 293x longer to import than the 1st column

Please use Cells.ImportDataTable method to import a data table. If you import column by column, we have to insert cells to the list and moving the cells in the list many times will consume more time and memory.

RiskSWDeveloper:

  1. 27 seconds (25% of the time) is spent fitting the columns and rows after the data is imported.

Fitting the columns and rows is a time-consuming work. We have to measure the width of all cells’ value and get the correct width of columns and rows. So, this process may take some time.

RiskSWDeveloper:

  1. These times are almost purely from the Aspose code. If you add up all the timers, the remaining code of ours is only taking 2 seconds of the 1:41 minutes of CPU time.

  2. We frequently get OutOfMemory errors running this code. We believe that the errors are due to memory fragmentation and not a lack of RAM, as we have 3GB of RAM (the max usable by Windows XP, which is our operating system) on our machines.

a. Running Microsoft DebugDiag tool, we can see that after running the Excel download we have 82% free memory fragmentation

It will save time and memory by using Cells.ImportDataTable method to import a datatable.

Thank You & Best Regards,

We have already tried using the Cells.ImportDataTable method, and this takes almost as much time, and triggerst he same Out of Memory errors, as column by column.

For comparison's sake, your solution is taking almost 2 minutes for 400,000 data points. I asked GemBox yesterday how long a 400,000 data point export takes in their system, and the developer who responded said he ran this test on his machine in response to my question and it took 2.5 - 3.2 seconds, depending whether he used Excel 2003 or 2007 format. Exporting by writing out an HTML format that Excel is able to read takes our program 5 seconds for these data.

This time differential is extremely dramatic, and our customers can not tolerate the Out of Memory errors they (and we) are getting with your tool. We are not satisfied with either of the alterantives I just cited for other reasons (GemBox doesn't do rich text formatting, and HTML format relies on ADO, which sometimes truncates data due to bad assumptions about column data types). But with this kind of performance differential, and with all of the Out of Memory errors that our customers are experiencing, we have no choice but to back your tool out of its primary use in our product. It is simply not an option, in spite of the issues with our alternatives, for us to continue to use your tool to meet this need of ours if the performance and memory usage are not improved.

Is there anything you can do at this point to prevent these Out of Memory issues and to speed up this abysmal performance, in order to make your tool viable for us to use?

Thanks.

Hi,

I made a sample program to populate 400,000 cells. Following is the list of time:

1. 0.3 second to create a data table

2. ImportDataTable takes 0.7 second

3. AutoFitRows takes 2.8 seconds

4. AutoFitColumns takes 15.8 seconds

5. Saving Excel file takes 1.2 seconds

So you can see if you just want to put data into an Excel file, Aspose.Cells only takes 1.9 seconds. Please check my attached cs file.

Could you post a sample project here? Thus we can see how to optimize it.

We tried your sample program, and indeed the performance with that program was as you describe. We were sure we had tried ImportDataTable already, but we must have made a mistake in that process - many apologies. We now have tried that correctly, and the download time when using that method is much quicker (17 seconds instead of 1:40 minutes).

However, we still experience frequent Out of Memory issues, thrown shortly after the Aspose processing has completed. While this is not actually happening DURING the Aspose processing (it seems, instead, to be happening during an attempt to serialize session state), it never happened before in our application, and it does not seem to happen if we remove the Aspose tool and return to our old method of downloading, so it still seems like the Aspose code is fragmenting all the available memory on the machine. Have you looked at the memory profile of the tool and optimizing this so that it does not fragment memory?

Thanks.

Aspose.Cells is developed with pure managed code. We totally rely on .NET Framework to allocate and free the memory.

Please make sure if your program allows GC to free the memory after creating a large file. You can try to call GC.Collect method after creating large files.

Apologies again - we found an issue in our application code that was causing the remaining Out of Memory issue, in intermittent scenarios such that it looked like it was due to Aspose but ultimately turned out not to be.

The tool is working okay for us now with an export of the full datatable, and handling only individual cells that are rich-text formatted one at a time. We still would prefer to be able to export columns one at a time rather than exporting the whole table and then overwriting rich text cells, but this is an acceptable workaround for us for now.

Thank you very much for your help!

Hi,


Could you provide more details about the Out Of Memory issues? We are experiencing the same problem, and maybe your correction of the code pattern might help us too.

Many thanks,
Alex

The issues we had with Aspose cells performance were with cell by cell and column by column exports. When we export a full datatable at once, we don't have performance issues (but of course that's also not always sufficient - we can't handle rich text formatting of cells in a full-datatable export). We do also find that performance is pretty slow on AutoFitRows and AutoFitColumns commands.

As far as Out of Memory error per se though, as I said above, that turned out to be due to other code in our solution, not Aspose. There was an intermittent problem in the code I was using to parse out HTML in order to know how to rich text format cells being exported via Aspose. Because that code ran intermittently between calls to Aspose, and because the issue caused by that code was intermittent, I at first thought that the Aspose tool itself was the problem, but it turned out not to be.

For isolating performance issues, we found using Stopwatch very helpful. You could also try isolating out all logic that's not Aspose into separate functions, and then I *think* you can use .NET profiler to see where large amounts of memory are being allocated.

Best of luck!

Thanks a lot for the quick, yet very useful reply. I will come back as soon as I have any update on the issue.

Best regards,