3 Bugs

Here’s a batch for you to look at. I’ve attached the project. Just change the inputs in index.aspx.vb:

1) Test1: Infinite loop when filling a template with 5,000+ rows of data.
2) Test2: Only the first row of sub-totals is correct. All other formulas do not update properly.
3) Test3: Only the chart on the first worksheet is updated. The "info@logixml.com" appears twice.

The third input was working with Aspose.Excel 3.5.3.3.

Thanks,
Natan

Hi Natan,

1. 14 worksheets with 5000+ rows of data may be too much for a web application. It may exceed the memory limitation of IIS which causes IIS to recycle the memroy. Please try to convert your program to a console application to see if it works fine.

2. I don't find the subtotal problem. Please try this attached fix.

3. Fixed in the attached fix.

Laurence,

1) I converted my project to a console application and it completed after a minute or two. How can we resolve the memory issue? We’ve been creating large Word and PDF documents with thousands of rows with the Aspose.Words and Aspose.Pdf components for the web.

2) The subtotal problem is still an issue. Notice that all the subtotal formulas are SUM(D4:D15). The second one should be SUM(D20:D31) and so forth.

3) The fix worked. Thanks!

Natan

Hi Natan,

1) Many calls of InsertRows and CopyRow method will consume more memory and degrade the performance. I use your data1.xml to do a test. It consumes 80MB memory to create a 1MB file. While I directly use Aspose.Cells to open this output file, it only uses 10MB memory.

In your code, I found your code may do the following things:

-Insert a row in index 4, then copy row from 3 to 4

-Insert a row in index 5, then copy row from 3 to 5

Since CopyRow will also copy the data and formattings, is it possible to remove those InsertRows call to speed up your program?

2) For subtotal problem, I fixed it. Please try this attached fix.

Laurence,

1) I think it will be possible to make one call to InsertRows(), and then make several calls to CopyRows() in order to bring the styles and formatting down. If we only call InsertRows() once, will that significantly improve performance?

2) The subtotal formulas are off by 10 rows. For example, the second subtotal should be SUM(D20:D31) but it’s SUM(D10:D31). This happens to all of them except the first.

Thanks,
Natan

1) Sure calling InsertRows once will improve performance. Your console application runs 133 seconds in my machine with Data1.xml. When I remove all InsertRows calls in your program, it runs 24 seconds. And I find the output file is almost same.

But the memory utilization is not optimized.

2) I noticed this issue. But this is not a problem of Aspose.Cells. You should rearrange your code to solve it.

The first subtotal is SUM(D5:D16). When it's copied to the first time, it's SUM(D10:D21). After you insert 10 rows in row index 20, it's changed to SUM(D10:D31). That's the routine in MS Excel.

Laurence,

Since we're filling templates, we can't just write data to existing rows because it could potentially overwrite static content in the template. For every XML data row, we really need to insert a new row into the spreadsheet. I've noticed that your component utilizes 100mb of memory to produce a 1mb file when InsertRows() method is used. It will consume 115mb if I also use CopyRow() to copy styles. Is there any way to improve memory usage? If so, when could we expect such an enhancement?

Thanks,
Natan

Hi Natan,

1. Please try to call InsertRows method as less as possible.

2. I will check how to improve memory usage. It will take about 1-2 weeks to make it.

Laurence,

Does the 4.0 version include these performance updates?

Thanks,
Natan

No. I tried to find some approaches to optimize the performance but only have a very little progress.

Please call InsertRows method as few as possible. And after accessing Cell.Style property, if you don't change anything on the style settings, please set it back to null.

Laurence,

Unfortunately our entire program is based off the ability to insert rows. We need to do this because inserting rows is the only way to dynamically update Excel formulas and charts. We’d like to plan ahead here, so I see 2 courses of action:

1) We wait for your performance enhancments.
2) I redesign the application so that InsertRows() is called once.

I don’t particularly like option 2 because it will increase the complexity of my code and it may not help performance that much. All I really need is a better idea of the scope of this issue.

Do you think a significant performance improvement will be possible on your part? If improving performance is not possible, can you confirm that inserting a million rows once is noticeably better than inserting one row a million times?

Thanks,
Natan

Dear Natan,

I understand that changing your code will increase the complexity of your code. However, I think you may have to do it because I cannot find a way to improve the performance significantly.

Let me explain: when InsertRow or InsertRows method is called, Aspose.Cells will do the following work:

1. Push down existing cells.

2. Add extra blank cells to copy the formattings.

3. Check the formula and change the reference.

4. Check the charts and change the reference.

5. Check other reference and change them, it may be : merged cells, data validations, hyperlinks ...

Every time when you call InsertRow or InsertRows method, this work will be done.

I assure that inserting a million rows once is thousands time faster than inserting one row a million times.

Following is a sample case to show the difference:

1. Case 1: Inserting 10000 rows at once takes about 0.01s in my machine.

Workbook workbook = new Workbook();
workbook.Open("d:\\test\\book1.xls");

DateTime start = DateTime.Now;

workbook.Worksheets[0].Cells.InsertRows(0, 10000);

DateTime end = DateTime.Now;

Console.WriteLine(end.Subtract(start).TotalSeconds);

2. Case 1: Inserting a rows 10000 times takes about 1.4s in my machine.

Workbook workbook = new Workbook();
workbook.Open("d:\\test\\book1.xls");

DateTime start = DateTime.Now;

for(int i = 0; i < 10000; i ++)
workbook.Worksheets[0].Cells.InsertRows(0, 1);


DateTime end = DateTime.Now;

Console.WriteLine(end.Subtract(start).TotalSeconds);

Attachement is a simple template file. If your file are more complex, the performance difference may be even larger.