Read/Write huge excel files(1M rows)

hi,

Could you please tell if aspose.cells provide less memory cost write feature for huge excel files.
I have found light cell api but it works only for read using event approach.

Hi,


Thanks for your query.

Well, you may use LightCells APIs for both read and write operations, see the document for your complete reference here:

Also, you may try MemoryPreference option available in Aspose.Cells APIs if you do not want to use LightCells mode, see the document for your reference:

We also recommend you to kindly try our latest version of the product Aspose.Cells v8.6.1 if you are not already using it.

Hope, this helps a bit.

Thank you.



thanx,

I will explore both these options
Also could u tell if lightcells can be used for insert and append rows in an existing excel file??

and in memory preference option of aspose.cells do we need to provide more memory for large files??
Hi,

Well, you can build Workbook from an existing template file and then save the workbook using LightCells APIs. In this way, existing data in the workbook will be saved too, you require to implement the LightCellsDataProvider interface to accommodate your scenario.

Also, using MemoryPreference option would consume less memory comparing loading/saving files in normal mode.

Thank you.

thank you

aspose is looking very promising for my usecase
Could you provide java sample code for both these approaches.

Hi,


Sure, see the documents for your reference:

Hope, this helps a bit.

Thank you.

Hi Amjad,

Thanks for the prompt response and details. Just wanted to be absolutely sure that our business requirement is catered by Aspose.Cells:

We have following requirement:

  1. Insert rows in existing excel file. for. e.g. - we have an excel file which already has one million records and using java Aspose.Cells we would like to insert two new rows at some starting position, let’s say B1 of sheet 1.

  2. Append rows is existing excel file (i.e. at the end of sheet)

  3. Overwrite rows in existing excel file

  4. Excel file has approx. one million records & multiple sheets. We are looking for an approach which is memory optimized and does not load full workbook object in memory.

Can above requirements be catered by Aspose light cells. If yes, can you provide some details around what is the right way to implement LightCellsDataProvider interface (any sample code) .

PS: Earlier we were using apache POI user model based approach, which was causing a lot of memory issues, as this approach loads full workbook in the memory.

Hi,


Thanks for providing further details.

All your mentioned requirements can be accomplished by Aspose.Cells APIs. I think you should first go for MemoryPreference option. If you find any issue, then you may try using LightCells APIs, for reference, you should check the example in the topic here (you may add/update your code accordingly):
http://www.aspose.com/docs/display/cellsjava/Using+LightCells+API

By the way, Aspose.Cells also supports to load your desired worksheets (only) in the workbook, this may enhance performance and consumes less RAM, see the document for your reference:
http://www.aspose.com/docs/display/cellsjava/Load+Specific+Worksheets+in+a+Workbook

In any case, if you find any issue (regarding memory or other aspects), kindly do create a sample JAVA program (runnable) and provide us with all the details and files (e.g input + output etc.), we will check it soon.

Hope, this helps a bit.

Thank you.

hi,

thanks for the prompt reply
I explored the sample code for aspose.cells memory preference, there it was written that "This setting cannot take effect for the existing worksheets that are created before using the below line of code"

Hence it seems that we cant use this approach for existing spreadsheets but in our usecase we have existing spreadsheets which are read and write.

Please suggest if there is any work around to it.


Hi,


Well, you did not check the Note: (comments) below this in the code segment pasted in the document. That means by default, memory preference option won’t not set for existing sheet even you use the line of code:
i.e.,
workbook.getSettings().setMemorySetting(MemorySetting.MEMORY_PREFERENCE);

So, you would need to specify the MemoryPreference option for the invidual cells of the sheet(s), see the note (I copy/paste comments and some lines of code from the original example code which sets MemoryPreference option fine for the first worksheet cells):
//To change the memory setting of existing sheets, please change memory setting for them manually:
Cells cells = wb.getWorksheets().get(0).getCells();
cells.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);

Let us know if you still have any issue or confusion.

Thank you.

Hi,


Adding more to my previous reply. Please note the following for your requirements:


1) Because the memory issue may be caused when loading the template file into memory completely, so user might have no chance to use Cells.setMemorySettings() for the large workbook. Instead, kindly use LoadOptions.setMemorySettings() to load/build the workbook in MEMORY_PREFERENCE mode. When the workbook was loaded in MEMORY_PREFERENCE mode, then all its worksheets will use this mode as the default to build cells model. You should try this way firstly to test whether it can solve your memory issue.


2) If MEMORY_PREFERENCE mode cannot solve the issue, you may consider the usage of LightCells. However, I think it is very complicated, and even impossible for users to implement all the logic by themselves if there are other things which need to be changed when inserting rows, such as formulas, named ranges, …etc. To achieve the goal of reducing memory cost for large workbook in template file, you need to combine the usage of LightCellsDataHandler and LightCellsDataProvider. That is, you should load the template file in light cells mode too. your implementation of LightCellsDataProvider will use cells data provided by LightCellsDataHandler to fill the workbook accordingly. It needs two threads to work together, like pipe stream, the loading process provides data and the saving process consumes data. And because you need to insert rows, then he should handle the row index modification too in LightCellsDataProvider, such as, if you insert one row before the third row (A3, B3, the row index is 2), for all row/cells got from LightCellsDataHandler, when their row index is greater than 2, you should fill them into the next row in LightCellsDataProvider(row index 2 becomes 3, 3 becomes 4, …).


Hope, this helps as well.


Thank you.

hi,

Thanx for your prompt replies.
We are now implementing aspose at our end and will be purchasing the library soon.
There is just one issue

While loading workbook object we need to give absolute path of file in string. Is there a way to provide a stream of file path while loading workbook object?

Hi,


Good to know that it’s working on your end now.

Regarding your new issue, well, Aspose.Cells also supports to load/read Excel file from streams, you may pick relevant overload for your requirements, see the document for your reference:
http://www.aspose.com/docs/display/cellsnet/Opening+Files

Thank you.

hi,

Thank you for reply. Streaming is working fine now.
There is just one error :
When I load a workbook object an error appears on our server by aspose api.
This doesnot affect the functioning of the API but still it needs to be removed to make it bug free.

I am mentioning the error below:

ERROR [Aj] com/aspose/cells/a/f/zb
INFO | jvm 1 | 2015/12/11 12:17:04 | java.lang.StringIndexOutOfBoundsException: String index out of range: -1
INFO | jvm 1 | 2015/12/11 12:17:04 | at java.lang.String.substring(String.java:1955) ~[na:1.8.0_45]
INFO | jvm 1 | 2015/12/11 12:17:04 | at org.aspectj.weaver.bcel.BcelObjectType.getOuterClass(BcelObjectType.java:904) ~[aspectjweaver-1.8.6.jar:1.8.6]

Hi,


Good to know that it is working better now.
Regarding your recent error you mentioned, could you try our latest version/fix: Aspose.Cells for Java v8.6.2.4 if it makes any difference.

If you still find the issue, kindly do provide sample JAVA console program (runnable) to reproduce the issue on our end, we will check it soon.

Thank you.

hi,

I am not able to deploy your latest jar. Also its only 6 mb as compared to 20 mb for 8.6.0 version.

Also I am using java 1.8 version 45. So can the error be related to that?

I have now checked for the latest jar you mentioned but I am still getting an error.

can this be a problem of java version??

Hi,


Well, this does not seem to be a problem with JDK version (e.g v1.8), it should work fine. As requested earlier, kindly do provide a sample JAVA console program (runnable) to reproduce the issue on our end, we will check it soon.

Thank you.

hi,

Thank you for the reply
We are trying to reproduce it in a separate sample project but meanwhile could tell is a way to suppress the logs of aspose jar??
Do you provide any properties file for the same?

Hi,


I am afraid, as we have no clue about your issue so far, so we cannot provide any means to suppress the logs you are facing. In some cases, we do provide debug versions for the user to trace the issue but your scenario seems to me a different one. As requested earlier, kindly do provide sample JAVA console program (runnable) to reproduce the issue on our end, we will check it soon.

Thank you.