JVM heapspace getting filled when creating PST with a lot of mails with large attachments

I’m trying to use the email library for java to create a PST file but no matter what I do, the heap space is saturating when the PST file reaches around 2gb (Allocated around 8gb max space). Then the thread blocks for a long time at setCapacity and takes forever to add new messages. I’m trying to add around 100 mails of 40mb each. I even tried to dispose the MapiMessage as soon as I add it to the PST. Also, followed the chunking approach mentioned here by using the MemoryStream to write to a FileStream and flushing it. But only mails from the first chunk is showing up inside the PST. Could you kindly provide a working code snippet to manage this efficiently?

@gauthamr,

Could you please share your sample message files with us as the issue is not reproducible at our end with the latest version of Aspose.Email for Java 18.7? These will help us investigate the issue and assist you further.

Hi Kashif, thanks for getting back. The issue does not seem to be related to a specific eml itself. I’m just trying to create a PST file by creating a folder and adding eml files (randomly distributed sizes between 10 kb to 40mb). Allotted a memory of 8gb to the process. Initially it starts out fast. But as the PST file’s size reaches near 2 gb the process begins to crawl. On seeing the memory consumed, it keeps climbing and seems to be stuck at
java.lang.Thread.State: RUNNABLE
at com.aspose.email.system.io.MemoryStream.setCapacity(Unknown Source)
at com.aspose.email.system.io.MemoryStream.b(Unknown Source)
at com.aspose.email.system.io.MemoryStream.write(Unknown Source)

This is the sample code snippet:
FileOutputStream os = new FileOutputStream("/User/Test/sample.pst")
PersonalStorage pst = PersonalStorage.create(os, FileFormatVersion.Unicode);
FolderInfo finfo = pst.getRootFolder().addSubFolder(“SampleFolder”);
for(int i=0; i< 500; i++) {
MailMessage msg = MailMessage.load("/User/Test/Emls/bigeml.eml");
MapiMessage mapimsg = MapiMessage.fromMailMessage(msg);
finfo.addMessage(mapimsg);
mapimsg.dispose();
}

I have also tried the suggested method of creating a MemoryStream and then writing it to the FileStream periodically and flushing it. But unable to get it working properly. Only the first batch of messages that is flushed to disk shows up in outlook. Can you provide some advise to handle memory properly? I’m trying to handle creating PST files sized upto 5gb. Some code snippets or jvm params to handle these kind of cases?

@gauthamr,

Could you please share how to observe this in Eclipse? We were, though, able to observe the program stalled after adding 31 Forty MB files to the PST while setting the heap size at 5 GB. This has been logged as EMAILJAVA-34409 for further investigation at our end and we’ll update you here as some information is available in this regard.

@gauthamr,

Could you please try creating the pst on file system rather than outputstream as follow and share your feedback?

PersonalStorage pst = PersonalStorage.create("sample1.pst", FileFormatVersion.Unicode);

It does go ahead and the PST file creation works without stalling the application at some point.

You are right. It does create the file if I provide the path instead of an Outputstream. The same issue exists when trying to read a large PST from a custom Inputstream. If I provide the path directly there, it works.
Anyway please let me know once the issue has identified because I will be using custom input/outputstreams.
I got the trace above by running the process in eclipse and using the jstack command from the jdk. ( jstack [processid] > out.txt). This is one of my sample scripts if it might be of help.

@gauthamr,

For using custom streams, you can use the attached CustomStream class. Sample usage is as follow:

Stream stream = new CustomStream(Paths.get("c:\\test.pst"));
PersonalStorage.create(stream, FileFormatVersion.Unicode);

CustomStream.java.zip (1.5 KB)

Hi Kashif.
Thanks. The above code works for creating file in the local file system. I was looking to create a PST to a remote file system directly (HDFS etc). Will that be possible? Any update on the issue in PersonalStorage.create that takes in a Outputstream?

@gauthamr,

We have logged your concerns against the logged ticket and will update you here once further information is available in this regard.

@gauthamr

HDFS was designed for immutable files and may not be suitable for PersonalStorage implementation.

Java OutputStream is not supported for seek and overwrite features and we can’t use it in PersonalStorage implementation without caching.
In case of

PersonalStorage.create(OutputStream stream..

changes are cached in RAM and saved to the OutputStream after PersonalStorage close.

In case of

PersonalStorage.create(Stream stream..

changes are not cached and saved to the Stream immediately.

CustomStream implementation is a sample only and can be used for memory management in case of non standard FS without overhead. Anyway in case of HDFS we need to implement own cache.

public class CustomStream extends Stream {...

// Abstract methods must be implemented:

// Gets the length of the stream in bytes
getLength()

// Gets the current position within the stream.
getPosition()

// Reads a block of bytes from the current stream and writes the data to a buffer
read(byte[] buffer, int offset, int count)

// Sets the position within the current stream to the specified value.
//   loc - seek reference point
//     Begin = 0
//     Current = 1
//     End = 2
seek(long offset, int loc)

// Writes a block of bytes to the current stream using data read from a buffer
write(byte[] buffer, int offset, int count)

PersonalStorage pstFile = PersonalStorage.create(new CustomStream(), FileFormatVersion.Unicode);

Java RandomAccessFile is used for PersonalStorage implementation in case of

PersonalStorage.create("tmpFileName", version);

RandomAccessFile supports seek and overwrite features but it can’t be used with immutable HDFS.

@mudassir.fayyaz Thanks for your detailed explanation. Yes, we had assumed that there’s the need for random seeks when working with PST files, and hence the use of the cache you had mentioned. However,we have our own off-heap cache implementation of CustomStream to avoid the garbage-collection overload when working with a large PST files that would be processed for a long time.

@gauthamr

Based on detailed explanation from our side and your feedback, can we close the issue now.