High cpu load and memory consumption during report generation

Hi,
our developers noticed that during report generation with Aspose.Cells for Java (reading in XLSX template, replacing values, writing XLSX or PDF output) there’s a huge amount of getResourceAsAStream calls (>1.6 million times) by the API methods resulting in high cpu load (about half of the cpu load during report generation) and huge temporary memporary allocations. As we cannot fully disclose our own application code we have put together a small sample code where there are still more than 100 thousand of the getResourceAsAStream calls. Please see the attached zip file which includes the sample code as well as an profiler analysis. We tested the code with Aspose Cells release 22.1.
workbook-demo.zip (331.7 KB)

Is there any way to limit these resource calls and thus lower cpu load and memory consumption?

Best regards,
Benjamin

@global-format,

Thanks for the sample files and details.

We do not think these obfuscated calls could cause significant CPU and memory consumptions. Anyways, I have logged a ticket with an id “CELLSJAVA-44415” for your issue. We will investigate and look into it in details.

Once we have an update on it, we will let you know.

Well, at least we are not talking about a “vague guess” but about the outcome of a memory and CPU profiling. See attached screenshot.

aspose-high-memory-consumption.png (296.8 KB)

@jlessner,

Thanks for the screenshot.

Please spare us little time to evaluate your issue in details. We will get back to you with further updates soon.

@jlessner,

With your provided files and code, we cannot find “more than 100 thousand of the getResourceAsAStream calls”. For your two files there are only 5-6 calls of getResourceAsAStream() in our test. Are you opening those files repeatedly for many times?

Oh, please excuse me - I used the totally wrong terms because I was concerned with CPU profiling before. So what the profiling screenshots actually show is not the number of calls of getResourceAsStream but the number of objects and consumed memory caused by reading the resources. So the minimal code example in fact causes only a handfull of calls of getResourceAsStream but reading the resources causes the allocation of more than 100 thousand objects - mainly Strings. In our production scenario there are more calls of getResourceAsStream and an allocation of 1.6 million objects caused by these calls.

The memory can be fully garbage collected after report generation. But the question is, if we have the chance to avoid the repeated allocations with every single report generation. The reporting is performed from within a web server, and if multiple users create reports simultaniously, the server requires a huge amount of temporary memory although all these reporting tasks read exactly the same resources again and again. It does not only cause high memory usage but also a significant CPU load.

Is there a way to cache the resources somehow? From the obfuscated class I can see that there are only a few files of interest like Aspose.Cells.typefaces.zip, Aspose.Cells.Theme2007.dat, and Aspose.Cells.wa.bin. So we could probably save memory and CPU load if we would read them only once per server cluster node and keep them globally cached for reuse in every subsequent reporting task.

@jlessner,

Thanks for sharing your concerns.

We have logged it against the ticket into our database. We will evaluate and get back to you soon.

@jlessner,

We did not find the issue that more and more objects are allocated repeatedly when calling getResourceAsStream() repeatedly. By our test all extra objects are allocated at the first time of calling getResourceAsStream() only. By our investigation and test for many situations, we found when loading a signed jar by the JVM, especially when there are large amount of entries/classes in it, large amount of objects will be allocated when using getResourceAsStream(). We think it is caused by Java’s mechanism of verifying signatures and we can do nothing to improve it.

Even though by our test getting one resource by getResourceAsStream() repeatedly does not increase the allocation of extra objects, we do improve our code to reduce the invocation of getResourceAsStream() according to your suggestion. The change will be included in the coming official release 22.4. You may try it when it is published (hopefully the new release will be published before the end of this week or so).

The issues you have found earlier (filed as CELLSJAVA-44415) have been fixed in this update. This message was posted using Bugs notification tool by Peyton.Xu