Data imports are SLOWER than excel

pyounger · December 8, 2005, 9:00am

using the import data table method (2000 rows x 10 columns of data) took 15-20 seconds.

using the copyfromrecordset method of excel it only took 4 seconds.

is there a fast way to import large data sets?

Laurence · December 8, 2005, 6:37pm

I think you imported data into a worksheet with a large amount of data, right? If you import 2000rows x 10 columns of data into a blank worksheet, it takes only 0.1 seconds to import data and 0.7 seconds to save the data. My laptop is PentiumM 1.4G with 512MB memory.

However, when I import data into a worksheet with a large amount of data, the performance degrade heavily. I will optimize my code in this situation. I will release a fix for this enhancement next week.

Laurence · December 8, 2005, 7:00pm

Could you please post your template file and sample code here? That will help us to optimize the component.

Laurence · December 8, 2005, 8:40pm

Please try this fix. It will be much faster when using ImportDataTable method.

mozila · December 12, 2005, 4:36pm

Can I add additional questions? Performance (Time and Memory consumption) wise what would be more efficient and why:

Import DataTable.
Looping thru DataTable and ImportArrayObject
Looping thru DataTable/DataRow and import Cell by Cell

Please be explicit in your choice and explanation, as it would have significant impact on design.

Note that my concern is to build memory efficient application as in case of 65000 rows Aspose API consuming a lot Memory.

Thanks,

Daniel

Laurence · December 12, 2005, 5:48pm

Hi Daniel,

For memory consumption, your approaches are almost same.

For speed issue, 1 is fastest and 2 is slowest.

Aspose.Excel internally arrange all cells of a worksheet in a list, ordered by row and column. For example, if you create 6 cells from A1 to B3, it internal keeps them as A1, B1, A2,B2, A3,B3. If you want to insert a row and add data in A1 and B1, it will change index and insert new Cell A1, B1 at start.

If you repeatedly insert cells at start, this action will degrade Aspose.Excel's performance heavily. I optimized ImportDataTable method to make this insertion action just once. And I will optimize the ImportObjectArray method in the next step. However, I cannot optimize the speed if you import cell by cell at start or ImportArray column by column.

To minimize memory consumption, please avoid to access cells without data and assign a single Style object to cells with same formattings.

mozila · December 13, 2005, 11:29am

Hi Laurence,

I appreciate your help. Implementation details you are providing will help us to choose the right approach.

Now I understand your internal structure and that I better of adding data (while adding cells or rows) cell by cell or row by row going from the left to right and from top to bottom.

I have some additional questions.

Can you please explain what did you mean by “ .. please avoid to access cells without data.. “?
ImportDataTable can import data starting in certain Excel row and column and import certain number of rows and columns from DataTable. Is there a way to start import from certain row /column in DataTable? I’m looking for optimum way of inserting data when DataTable has more then 65536 rows. In this case I would like to import first part (up to 65536) in first worksheet and continue insertion in the next worksheet. Note that I can not use smart markers in this case and would like to avoid option of looping thru rows and using ImportDataArray.
Import Cell by Cell approach (using Cell.PutValue(object) Method ):

Do you need to determine the object type (internally, in your code) when I’m using the Cell.PutValue(object)? If yes, would use of ImportDataTable() be more efficient in this case as you will now the object type for the whole column?
Do you make a copy of the object value or using object passed as a parameter?

Are you using any data caching for memory usage optimizations (for instance, memory caching could be the caching of repeated data in excel). If not, do you have planes to make it available in the future and in this case would use of ImportDataTable() will be more efficient (as opposite of using ImportDataArray or cell by cell) in terms of getting you more options for optimizations (data insertion, caching, type checking, etc.)?
The DLL you have provided in this thread is optimizing the ImportDataTabe() method. Can I just replace file stored by Aspose installation? Would it be available next time I download new installation?

Thanks,

Daniel

Laurence · December 13, 2005, 9:26pm

Hi Daniel,

1. To save memory, if you don't access a cell, it's not created in memory. For example, if you access A1, Cell A1 is created. So if you don't want to set data or formats to cell A1, please avoid to access it.

When you use cells["A1"] or cells[0, 0], an Cell object instance for A1 is created.

2. In your case, you can use ImportDataTable method to import first part of rows then delete those rows of data then use ImportDataTable method to import other part of rows to another worksheet.

To import certain columns, you can use ImportDataColumn method if the amount of data is not very huge. However, if the data is huge, please create a DataView based on your DataTable then use ImportDataView method to import them. I will also optimize ImportDataView method.

3. Yes. I should check the value type internally. But I think it only has small influence on performance. All data written to Excel file are value type, so I will copy the object data.

4. All strings are put into a string pool to save memory. Memory caching is not used. I think it won't help much on optimization. It's not the performance bottle neck.

5. You can replace it with the new dll. But please rebuild your program for Aspose.Excel is a strong named dll. When you rebuild your program, please check the new dll is applied in your project.

mozila · December 19, 2005, 7:11am

Hi Laurence,

Thanks for your reply.

In the above I was assuming that if I would use ImportDataTable method that will give you more options for optimizations. I see now that this is not the case. Please confirm the following:

If I want to minimize memory utilization I should use Cells.ImportFromDataReader as you will copy data directly from the Reader into Excel when in case of using ImportDataTable I will have data loaded into DataTable and then copied into Excel.
If the number 1 is correct then looping thru DataReader and import Cell by Cell still will be more efficient (memory wise) then Import DataTable. Cell by Cell import should be from left to right and top to bottom.
Just clarifying. When you are saying that you are using the String Pool I assume that you are using String.Intern(). Right?
Can you please tell me what API call Pyounger is referring to in the first post in this thread “ … using the copyfromrecordset method of excel it only took 4 seconds..” Is it Cells.ImportFromDataReader as I didn’t find copyFromRecordSet function.
When can we get new DLL with all optimizations you were talking before.

Thanks,

Daniel

Laurence · December 19, 2005, 7:26am

1. I optimized ImportDataReader method so using it to import data will be more efficient.

2. Yes.

3. No. But the mechanism is very similar to String.Intern().

4. He must talk about ImportDataTable method.

5. Please download and try v3.6.1.

mozila · December 19, 2005, 7:44am

Can you please provide a link to API Doc for ImportDataReader() method (..Added new overloaded ImportDataReader method to import data reader column header .. )

Thanks,

Daniel

Laurence · December 19, 2005, 5:37pm

Please check http://www.aspose.com/Products/Aspose.Excel/Api/Aspose.Excel.Cells.ImportFromDataReader3.html and

http://www.aspose.com/Products/Aspose.Excel/Api/Aspose.Excel.Cells.ImportFromDataReader2.html.

They are also included in the new setup msi.