Aspose.Cells support SAX-like read/write?

ZachBlocker · August 9, 2013, 5:05pm

For reading and writing very large spreadsheets, does your solution support SAX-like read and write capability, so that only a small fragment of the spreadsheet exists in memory at any given time?

amjad.sahi · August 12, 2013, 4:19am

Hi,

Well, we have LightCells APIs feature that you may try to use. But, currently, it only supports to write large Excel files but does not support reading files. The LightCells API is SAX-like feature. Please see the topic for your complete reference:

http://www.aspose.com/docs/display/cellsnet/Using+LightCells+API

Thank you.

ZachBlocker · August 12, 2013, 12:32pm

I’m confused about some language in the introductory section of the linked documentation. Particularly where it reads

“Commonly,using LightCells API to save XLSX file may save 50% or more memory than using the common way”

It seems to me that with a pure sequential write approach, memory usage would be flat, and would not be expressed as a percentage savings over the DOM-based method. Please look at the quick drawing I made for a comparison:

Memory usage graph for spreadsheet libraries - Google Drawings

Based on this graph, which of the dashed lines most resembles the memory usage model of LightCell API? The dotted red line or the dotted green line on the bottom (flat usage)?

One other issue: Is there any more documentation for LightCells API, or is it just this one page?

ZachBlocker · August 12, 2013, 12:33pm

Previous drawing as a file attachment.

shakeel.faiz · August 12, 2013, 11:53pm

Hi,

Thanks for your posting and considering Aspose.Cells.

In fact saving an XLSX file is not so simple as writing xml data with a pure sequential mode. Firstly, the XML file is not the final XLSX file. Even if xml file can be created by writing data in sequential mode, it has to be processed with other logics to build the final XLSX file.

Secondly, to create XLSX file there are many things need to be gathered and indexed firstly instead of writing to the final xml file directly. Those gathered data has to be hold in memory until all data(cells) that use it has been generated.

So, the memory cost depends on many things such as cells dataset, used styles/shared strings, …etc. and the performance result may vary significantly for generating the same size of XLSX files with different data and settings.