Aspose Cells LightCells fails to read huge file


#1

Hi,
We’ve been trying out LightCells ability to read huge XLSX files (we have one that is 488MB … one million rows, 21 columns of various datatypes) and it instantly fails:

Java HotSpot™ 64-Bit Server VM warning: INFO: os::commit_memory(0x000000008d700000, 488636416, 0) failed; error=‘Not enough space’ (errno=12)

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 488636416 bytes for committing reserved memory.

Writing in Kotlin, the code I’m using is simply:

            val loadOptions=LoadOptions()
            loadOptions.lightCellsDataHandler=MyLightCellsDataHandler()
            Workbook("hugeFile.xlsx",loadOptions)

The same file loads OK with Apache POI … which is annoying, because we’ve found Aspose to be generally much faster at processing smaller files.

Is there any way around this?


#2

@peeveen,

Thanks for your query.

Please share your sample file and console application(runnable) with us for our testing. We will reproduce the problem and provide our feedback after analysis.


#3

Hi,

The XLSX file is too large to attach here, so here’s a link to the file, shared in my Google Drive:
https://drive.google.com/file/d/1Nx04rnfCXQY8EGj9lkhGCMxyYlziP-Ip/view?usp=sharing

Here’s the test program I’m using. It’s written in Kotlin (1.3). I’m running it inside IntelliJ, in case that matters.

import com.aspose.cells.*

class LCDH: LightCellsDataHandler {
    override fun processCell(cell: Cell?): Boolean {
        return false
    }

    override fun processRow(row: Row?): Boolean {
        return true
    }

    override fun startSheet(worksheet: Worksheet?): Boolean {
        return worksheet?.index==0
    }

    override fun startCell(column: Int): Boolean {
        return true
    }

    override fun startRow(rowNumber: Int):Boolean {
        if(rowNumber%10000==0)
            println("Row $rowNumber reached.")
        return true
    }
}

fun main()
{
    val loadOptions=LoadOptions()
    loadOptions.lightCellsDataHandler=LCDH()
    Workbook("aspose_meellionRows.xlsx",loadOptions)
}

The process very quickly grabs almost 2 gigabytes of memory, then increases slowly to 2.1GB.

It works for a while, but then gets slower and slower, until …

...
Row 810000 reached.
Row 820000 reached.
Row 830000 reached.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.HashMap.resize(HashMap.java:704)
	at java.base/java.util.HashMap.putVal(HashMap.java:663)
	at java.base/java.util.HashMap.put(HashMap.java:612)
	at com.aspose.cells.zbuy.a(Unknown Source)
	at com.aspose.cells.zbuy.a(Unknown Source)
	at com.aspose.cells.zano.a(Unknown Source)
	at com.aspose.cells.zano.B(Unknown Source)
	at com.aspose.cells.zano.A(Unknown Source)
	at com.aspose.cells.zano.z(Unknown Source)
	at com.aspose.cells.zano.a(Unknown Source)
	at com.aspose.cells.zclz.a(Unknown Source)
	at com.aspose.cells.zd.d(Unknown Source)
	at com.aspose.cells.zd.p(Unknown Source)
	at com.aspose.cells.zd.a(Unknown Source)
	at com.aspose.cells.zanm.a(Unknown Source)
	at com.aspose.cells.Workbook.a(Unknown Source)
	at com.aspose.cells.Workbook.a(Unknown Source)
	at com.aspose.cells.Workbook.<init>(Unknown Source)
	at AsposeTestKt.main(AsposeTest.kt:31)
	at AsposeTestKt.main(AsposeTest.kt)

Process finished with exit code 1

If I replace the “return false” in processCell() with “return true”, the process dies around the 400,000 row mark.

I’ve run a similar test using the Apache POI libraries (XSSFSheetXMLHandler). The java.exe process never exceeds 200MB, although it is a bit slower than Aspose.


#4

@peeveen,

Thank you for sharing sample file and code. We are downloading this file and will provide our feedback after testing the issue using this sample file.


#5

@peeveen,

We were able to observe the issue but we need to look into it more. We have logged the issue in our database for investigation and for a fix. Once, we will have some news for you, we will update you in this topic.

This issue has been logged as

CELLSJAVA-42748 - LightCells fails to load huge file

#6

@peeveen,

We evaluated your issue further. The issue is due to the fact that the file data itself is too large. Currently we need to keep the original data of the file in memory while processing the workbook, so when processing larger file, then more memory will be required. We will optimize the process to improve memory performance. However, it is a complicated task and we need some time to finish it.

Once we have any new information, we will share it.


#7

@peeveen,

This is to inform you that we have fixed your issue (logged earlier as “CELLSJAVA-42748”) now. We will soon provide you the fixed version after performing QA and incorporating other enhancements and fixes.

We have improved the memory usage for loading large template files. And it is strongly recommended to use file to instantiate Workbook instead of InputStream because using file can give better memory performance after the improvement made in new versions.


#8

@peeveen,

Please try our latest version/fix: Aspose.Cells for Java v18.11.7:

aspose-cells-18.11.7.jar.zip (6.4 MB)

Your issue should be fixed in it.

Let us know your feedback.