Aspose.Cells for Python via JAVA throws "java.lang.OutOfMemoryError: Java heap space " when using threading

ashish.srivastava · September 5, 2024, 5:16am

Hi Team,

We are currently testing the Aspose.Cells for Python via Java trial version API to unhide the content of an XLSX file. However, we are encountering a “java.lang.OutOfMemoryError: Java heap space” error in the following scenario.

FileSize = 42 MB
Total file of same size = 2
ThreadPool = 2
FileType = XLSX
MaxHeapSize = 1024MB (This is the maximum I am permitted to use.)
Aspose Cells Version = 24.4.0

Even when I attempt to process a single file with one thread, it still consumes an excessive amount of memory. Below are the JProfiler memory statistics. Could you please help me understand why it is using so much memory?

I have attached 40MB test file and the memory snapshot for your reference.
image.png (25.1 KB)
40MB-TestFile.zip (8.3 MB)

Below are my sample code:

from multiprocessing.dummy import Pool as ThreadPool
import multiprocessing
import time
import jpype

if jpype.isJVMStarted() == False:
classpath = “C:/Program Files/Amazon Corretto/”
jvmArgs = [“-Djava.class.path=” + classpath, “-Xmx1024M”]
jpype.startJVM(u"C:/Program Files/Amazon Corretto/jdk21.0.2_13/bin/server/jvm.dll", *jvmArgs)

overAllTime = time.time()
max_num_processes = multiprocessing.cpu_count()
print(“\nMAX number of processes=” + str(max_num_processes))

def flatten(x):

from asposecells.api import Workbook
source_directory = "Examples/SampleFiles/SourceDirectory/"

start = time.time()
fileName = "40MB-TestFile.xlsx"

workbook = Workbook(source_directory + fileName)
for ws in workbook.getWorksheets():
	ws.getCells().unhideColumns(0, ws.getCells().getMaxDataColumn() + 1, -1)
	ws.getCells().unhideRows(0, ws.getCells().getMaxRow() + 1, -1)
	if ws.isVisible() == False:
		ws.setVisible(True);

# Save the excel file.
newFileName = str(x)+"__" + fileName
workbook.save(source_directory + "flatten/" + newFileName)
flattenTime = (time.time() - start) * 1000
print('\nFlatten Time taken =================================**************************{:.2f} ms'.format(
	flattenTime))

return x

a = [1,2]

pool = ThreadPool(2)
pool.map(flatten, a)
pool.close()
pool.join()

jpype.shutdownJVM()
duration = (time.time() - overAllTime) * 1000
print(‘\nTime taken {:.2f} ms’.format(duration))

Thanks,
Ashish

John.He · September 5, 2024, 6:35am

@ashish.srivastava
Due to the fact that the file contains over 20 worksheets, each containing more than 60000 rows of data, the number of cells will be a significant number.

You can use memory preference mode to load sample file. Please refer to the following example code.

import psutil
import os
import jpype
import asposecells
jpype.startJVM()
from asposecells.api import Workbook, SaveFormat, MemorySetting, LoadOptions

fileName = "40MB-TestFile.xlsx"
loadOptions = LoadOptions();
loadOptions.setMemorySetting(MemorySetting.MEMORY_PREFERENCE)

workbook = Workbook(fileName, loadOptions)

for ws in workbook.getWorksheets():
	ws.getCells().unhideColumns(0, ws.getCells().getMaxDataColumn() + 1, -1)
	ws.getCells().unhideRows(0, ws.getCells().getMaxRow() + 1, -1)
	if ws.isVisible() == False:
		ws.setVisible(True);

# Save the excel file.
newFileName = "40MB-TestFile_out.xlsx"
workbook.save(newFileName, SaveFormat.XLSX)

pid = os.getpid()
process = psutil.Process(pid)
mem_info = process.memory_info()

print("Memory consumption：{} bytes".format(mem_info.rss))

jpype.shutdownJVM()

The output result:

Memory consumption：560611328 bytes

Hope helps a bit.

ashish.srivastava · September 6, 2024, 9:24am

Thank you, John.
I’ll implement the changes based on your suggestions and proceed with testing again.

Thanks,
Ashish

John.He · September 6, 2024, 9:38am

@ashish.srivastava
Thank you for your feedback. Please take your time to try the suggested solutions. Hopefully, your issue will be sorted out. Please let us know your feedback.