Parsing Large Xlsx Files


#1

Hi,
I am tried to parse 500MB xlsx files using LightCellsDataHandler API, am not creating any other objects but still my heap size grows to 3GB.

Note: in processCell method am just returning false.


#2

@Sundarcj,

Thanks for your query.

If you are loading such a huge file, certain amount of memory will be consumed for sure. Generally 8-10 times or more memory of the size of the file is used (when using in normal mode), so this looks to me ok. By the way when you load such a big file into Ms Excel manually, it too takes memory and takes more lot of time to load it into MS Excel.


#3

@Sundarcj,

We evaluated your issue further. For processing common template file with LightCells, the memory cost should not be so large as you pointed out. Please send us the runnable code and template file and we will make further investigations and try to figure your issue out.


#4

can you share your email id so that i can share my test files to your drive


#5

What about meta detection ? like i want to detect cells data type… which i can get from your API. But i need to know is are you flush out the meta objects like light cells…??


#6

@Sundarcj,

As requested earlier, please send us the runnable code and template file, zip the project and template file and upload to some file sharing service (e.g dropbox, Google drive) and share the Download link here. We will check it soon.

PS. we cannot get such huge file and project via email.


#7

https://drive.google.com/open?id=1hREqynsvJk4VyugpXOxyuKykGBu_6A1P


#8

@Sundarcj,
We are checking the data and will share our feedback with you here soon.


#9

Okay thank you… and one more query i have a workbook with multiple sheet total size around 600MB and aspose unable parse this file.


#10

@Sundarcj,

Thanks for the template file and sample code.

After an initial test, I am able to reproduce the performance issue using your sample code with your template file (454MB). I found performance issue (memory goes high and it takes long time to complete the process, even I got “java.lang.OutOfMemoryError: Java heap space”) when parsing the large XLSX file in light weight mode. I have logged an investigation ticket with an id “CELLSJAVA-42935” for your issue. Since the file size is very large, so surely, it takes more time and consumes more resources for the big process. Anyways, we will look into your issue soon.

Once we have an update on it, we will let you know.


#11

This is the similar case, anyways, you may share the file, we will evaluate it as well.


#12

@Amjad_Sahi Thank you so much. how much time it would take?.


#13

@Sundarcj,

Since we just logged the issue, so please spare us little time (3-5 days or so) to evaluate your issue thoroughly before we could commit any eta or provide an update on it.

Once we have any new information, we will share it with you.


#14

@Sundarcj,
There are large amount of global cached string values in the template file. For such kind of string values in XLSX file, they must be loaded entirely before processing cells data for every sheet. By our test with given template file, to load those string values into cells model requires at least 2G memory. To make the program run successfully with this file, we think the JVM needs at least 2.5~3G memory.


#15

Okay thank you.can i give the HttpInput stream as input param to aspose parser?


#16

@Sundarcj,
Overload of Workbook constructor accepts InputStream so if you can cast HttpInputStream to it then you can pass it as input parameter.