LightCells missing in aspose-cells-7.0.4

Basuk · December 21, 2011, 8:27am

Hi,

We are currently evaluating the aspose cells api for reading excel worksheets for reading data from cells/worksheets. We were initially trying to work with the ‘Workbook’ api but looks like it has memory issues with big files. In our application we do expect our excel files to be of size greater than 500Mb upto 1Gb. While browsing the forums I found that we have a STAX implementation (i.e event based model) of Aspose.Cells in something called as ‘LightCells’.

I am currently working on the latest 7.0.4 api but I could not find any api for this other than LightCellsDataProvider. I want to know how we can use this API, currently there is no API documentation for LightCells.

I addition to this I have a few questions based on the LightCells API:

1. LightCells is a STAX implementation and hence it will not load the entire worksheet correct?

2. Does LightCells support reading ranges?

3. Does LightCells support reading all excel formats (2003-2010)?

4. What are the performance implications when compared to similar frameworks like POI?

amjad.sahi · December 21, 2011, 11:34am

Hi,

Please find attached the zip file containing the examples on how to use LightCells APIs with latest versions (e.g v7.0.x). Also, we will soon add a relative topic/article to Aspose.Cells for Java Documentation.

The LightCells API is useful to read/write huge Excel spreadsheets, you will need less memory and get better performance for sure.

For your other queries, we will get back to you soon.

Thank you.

shakeel.faiz · December 21, 2011, 8:10pm

Hi,

Please also note:

In versions 7.x, only saving XLS/XLSX was supported by LightCells API. Reading template files in light mode is not supported yet. In old versions before V7, we did support reading template files in light mode. But with those versions in light mode we can only simply provide cell content read from template file to user, all other objects such as ranges, styles, drawings, …etc. are not supported. Because we have made greate changes for our java component from V7, the old model of LightCells for reading template files was not suitable and we remove it from the new version. We will try to re-implement it in later versions to provide more flexible way for users to read template files in light mode, but we are afraid we cannot support it soon because there are some other important tasks we need to complete at first.

Basuk · December 22, 2011, 12:19am

Hi,

Just to reiterate Excel 2007 & 2010 follow the Office Open XML standard while 2003 does not follow this. I doubt we will be able to support reading from 2003 excel worksheets using LightCells since we dont have an xml to read here.

Let us know your thoughts if there is still a way to implement an Event based model for reading 2003 worksheets.

shakeel.faiz · December 22, 2011, 2:00am

Hi,

Well, the light cells APIs is mainly designed for manipulating cell data one by one without building the complete data model(Cell collection) into memory. It works in a event-driven similar mode. For saving workbook, user provide cell’s content one by one in the saving procedure and cells component saves it into the resultant file directly. For reading template files, cells component parses every cell and provide its value to user one by one. In both procedures one Cell object will be processed and then discarded, the Workbook object does not hold the collection of them, so certain amount of memory will be saved in this mode when importing/exporting excel files that large cells dataset in it costs most of memory. However, what you worry about is also a part of the truth. Because of the different data model and structure of XLS and XLSX files, light cells APIs saves memory more effectively for XLSX files than XLS files.

Basuk · December 22, 2011, 4:35am

Hi,

I am a bit confused on the latest reply 'However, what you worry about is also a part of the truth. Because of the
different data model and structure of XLS and XLSX files, light cells APIs saves
memory more effectively for XLSX files than XLS files.‘

Below is my understanding based on it:

1. LightCells will still load all the cells in memory for XLS files? This basically means that it will have pretty much the same behaviour as that of the ‘Workbook’ api right?

2. And if we are loading all rows in memory LightCells will work pretty much similar to jexcel api when dealing with 2003 worksheets, right?

’

shakeel.faiz · December 22, 2011, 7:56am

Hi,

Thanks for your questions.

LightCells actually will free the memory once it is done with processing of cells, it will not keep them holding.

Anyway we have forwarded your questions to development team, it will answer you asap.

shakeel.faiz · December 22, 2011, 8:40pm

Hi,

LightCells does not load all cells in memory for XLS files. It will process one cell and then discard it, and then to next cell. It is in the same way with doing with XLSX file. However, for XLSX file we can save data directly to the final stream/file.

For XLS, because its different file structure, all substreams of the final file need to be created in memory at first. So, with LightCells API saving XLS file will requires more memory than saving XLSX file. Commonly, using LightCells API to save XLSX file may save 50% or more memory than common way, saving XLS may save about 20-40% memory. For reading template file with LightCells, as we have said, we will support it in later versions but cannot finish it soon, so we cannot give more statistic data of performane for it currently.

Basuk · December 23, 2011, 1:55am

Hi,

Thanks for answering all our queries, we would be interested in the API especially for reading worksheets/ranges spawning 2003-2010.

But since its under development, please notify us once these features are available. We can then evaluate its usage appropriately.

Once again I would like to thank you on behalf of our company for providing such instant support.

toohotice · December 23, 2011, 3:39am

I need to generate a XLS-2003 file which may contain more than 65K rows and 400 columns using a simple
XLS template file

Data will be picked up from standard tab seperated data file.

I tried LightCells API. But it is throwing :- java.lang.OutOfMemoryError: Java heap space

Is this not yet supported by Aspose ?

amjad.sahi · December 23, 2011, 3:58am

Hi
Shrikant,

For your information XLS (Excel 2003) file format can only have 65536 rows and 256 columns in a single worksheet, this is the limitation put forth by XLS format (of MS Excel and not by Aspose.Cells for Java). So, you should use XLSX file format. Also, kindly use latest version v7.0.4.

Thank you.

toohotice · December 23, 2011, 4:11am

Amjad,

I tried to create a simple XLS file which has 65K rows * 240 columns.

Still it is throwing :- java.lang.OutOfMemoryError: Java heap space

Thanks
Shrikant

amjad.sahi · December 23, 2011, 5:25am

Hi,

Since you are creating a big file (having a worksheet with 65K * 240 cells filled with values) that would have a huge process which will demand some memory to be assigned to JVM still. I have attached a sample demo project that creates such a huge Excel XLS file, it woks fine here. Note: I have to assign some amount of memory to JVM while running the demo because the process, here is the command line that I simply used.

java -Xms550m -Xmx550m DemoTest

I am able to generate this huge file that also takes some time into MS Excel to be opened. I am using v7.0.4 of Java

Thank you.

toohotice · December 23, 2011, 6:29am

I already worked with this sample provided and run with the below JVM settings

-Xms1024m -Xmx2048m

LightCells missing in aspose-cells-7.0.4

I need to generate a XLS-2003 file which may contain more than 65K rows and 400 columns using a simple XLS template file

Data will be picked up from standard tab seperated data file.

I tried LightCells API. But it is throwing :- java.lang.OutOfMemoryError: Java heap space

Is this not yet supported by Aspose ?

I need to generate a XLS-2003 file which may contain more than 65K rows and 400 columns using a simple
XLS template file