Huge short time memory use- OutOfMemoryError

Hi there

I’m currently hardening our infrastructure. One part of it is a server that does mail merging. The datasets and templates are uploaded by the users, so our options to control this are limited.
While everything works fine and fast, we experienced OutOfMemoryErrors when the documents contain (large) images. Interesting enough this does not happen during the mail merge but when calling Document.save(OutputStream), stack trace:

java.lang.OutOfMemoryError: Java heap space
at asposewobfuscated.pu.ju(MemoryStream.java:180)
at asposewobfuscated.pu.jt(MemoryStream.java:144)
at asposewobfuscated.pu.write(MemoryStream.java:349)
at asposewobfuscated.am.a(MiscUtil.java: 425)
at asposewobfuscated.cb.a(FileSystem.java: 236)
at asposewobfuscated.cb.a(FileSystem.java: 257)
at asposewobfuscated.cb.a(FileSystem.java: 204)
at asposewobfuscated.cb.f(FileSystem.java: 136)
at com.aspose.words.dp.a(DocWriter.java: 28)
at com.aspose.words.Document.a(Document.java: 1472)
at com.aspose.words.Document.save(Document.java: 942)

I used a memory profiler and noticed that heap memory usage peaks (for a short time) during this method call. It seems that during save additional data is stored in memory which would explain this behaviour (the resulting document would be about 20MB in size, heap size in this test configuration is 64MB). The sad part is that the output stream already wirtes the data to a file, so I hoped that if the DOM fits in memory (which it does as the actual merge worked) a call to save() would be possible.

It is perfectly ok to reject a request which is too large, but it’s a little unacceptable that the operation dies with a OutOfMemoryError thrown and renders the application unusable.

So is there some heuristic we could use to guess if the save operation can be performed? Is it possible to walk the DOM and add some magic numbers and get a rough estimate of the space that is necessary during save()?

That aside I had also the case that the document almost did not grow in size when the number of records increased, this seems to be caused by the image being attached to the page rather than the paragraph or something.

We’re using the latest 3.something version of Words

Thanks
P.S. In production environment, max heap size will of course be more than 64MB, I’m currently trying to get some numbers and want to know how much heap a 20MB document will need and so on.

Hi

Thanks for your request. Could you please attach sample document here for testing and provide simple code, which will allow me to reproduce the problem on my side? I will check the issue on my side and provide you more information.
Unfortunately, there is no way to determine whether there is enough memory to perform save operation. Memory usage depends on document size and document’s content.
Best regards,

Hi there

I think it’s not necessary to attach a sample file. The problem arises with either large templates or not-so-large-templates but too many records.
I found a way to estimate the byte size of the resulting document with an acceptable error margin. The interesting question now is: do you have some heuristics to estimate the memory on heap that has to be available during the save when the resulting file (.doc) will have about X bytes?
Then the program could use the size-estimation and this special factor or formula and guess if it would run out of memory during the save operation. It is perfectly clear that this is like looking in the crystal ball, but if we were on the safe side (better safe than sorry, i.e. it’s better to reject a job based on this prediction that could have been completed than to cause major hazards in the infrastructure because of an OutOfMem-thingy) it’d be OK I think.

Thanks

Hi

Thank you for additional information. Memory usage also depends on the destination file format. So it is difficult to tell you how much memory it is necessary to save the particular document. May I know how you determine the output document’s size?
Best regards.

Sure you can… it’s acually pretty straightforward-but-ugly-trail(-and-error) based on the observation that the file size grows linear-ish with respect to the number of records.
So the server does two test-merges (say 10 and 20 rows), saves the output and calculates the offset and gradient of the resulting linear function. Then the number of actual rows is used to determine the resulting document size.
This works as long as the growth is linear. In .doc files this is the case, don’t know yet about docx or pdf. But I assume that it is usually pretty much the same as the information is more or less text which tends to grow linearly if it is duplicated for a mail merge.

I had not enough time yet to conduct a larger test sample, but the estimation based on the above scheme was about 1KB off (final document size around 5.4MB) during the first tests today.

Regards

Hi

Thank you for additional information. Maybe, you can try using the same logic to calculate amount of memory, which is needed to save the document.
Also, you should note, during saving to PDF, an intermediate model should be build, so this process will eat more memory than just saving document in MS Word formats, like DOC, RTF, DOCX.
Best regards.

Hi

Well, I’d like to use this method to directly compute heap memory requirements, but it’s not possible to acutally measure heap use during a save(). At least I don’t know of any way.
Basically a format dependent factor Y would suffice, i.e. X bytes final size => about X*Y Bytes Heap necessary when saving in the corresponding format.
Can you provide any such (rough) numbers? Currently, for doc-files, I am at a factor of about 9, but if you have other or more numbers, I’d be glad to hear of them.

Thanks

Hi

Thanks for your request. Unfortunately, I cannot provide you with such information because it is barely impossible. As I mentioned earlier, memory usage also depends on document complexity so for the same document size memory usage can be different.
Best regards.