PST Performance vs Open Source Alternative

We’re extracting messages from large PST archives and the performance is very slow compared to open source alternatives (such as lib-pst). I’m wondering if we’re using the API in the most efficient manner. There appears to be quite a lot of read I/O happening when moving between folders and messages.


I’m attaching a maven project with the test cases. You can obtain the sample PST here:

https://drive.google.com/open?id=0BwNdPjNyU4H9dUVIQkNfSExpN28

Test File: 500Mb PST
Aspose Time: 90-100s
Lib-PST Time: 7-10s

Aspose Test Code:

protected static final String FILE_NAME = “/Users/mcantrell/Documents/p9/Enron/andy_zipper_000_1_1.pst”;
protected static final String LICENSE_FILE = “/Users/mcantrell/dev/projects/platinum/storage/storage-expander/src/main/resources/Aspose.Total.Java.lic”;
protected static final int MAX_TEST_EXECUTION_MS = 30;

@BeforeClass()
public static void enableLicense() throws Exception {
License license = new License();
license.setLicense(new FileInputStream(LICENSE_FILE));
}


public void parsePst() throws Exception {
MessageDigest digest = MessageDigest.getInstance(“MD5”);
PersonalStorage pst = PersonalStorage.fromFile(FILE_NAME);
Stopwatch stopwatch = Stopwatch.createStarted();
explode(pst, pst.getRootFolder(), digest);
Long elapsed = stopwatch.stop().elapsed(TimeUnit.SECONDS);
assertEquals(“011585696df1c30d7bb03ac5e655550a”, encodeHexString(digest.digest()));
assertTrue(“Test execution time exceeded: " + elapsed + " seconds”, elapsed < MAX_TEST_EXECUTION_MS);
}


@SuppressWarnings(“UnusedDeclaration”)
protected void explode(PersonalStorage pst, FolderInfo folder, MessageDigest digest) {
for (MessageInfo info : folder.getContents()) {
MapiMessage msg = pst.extractMessage(info);
String subject = msg.getSubject();
String body = msg.getBody();
if (subject != null) digest.update(subject.trim().getBytes());
if (body != null) digest.update(body.trim().getBytes());
}
for (FolderInfo subFolder : folder.getSubFolders()) {
explode(pst, subFolder, digest);
}
}

Lib-PST Test Code

protected static final String FILE_NAME = “/Users/mcantrell/Documents/p9/Enron/andy_zipper_000_1_1.pst”;
protected static final int MAX_TEST_EXECUTION_MS = 30;


public void parsePst() throws Exception {
MessageDigest digest = MessageDigest.getInstance(“MD5”);
PSTFile pstFile = new PSTFile(FILE_NAME);
Stopwatch stopwatch = Stopwatch.createStarted();
explode(pstFile.getRootFolder(), digest);
Long elapsed = stopwatch.stop().elapsed(TimeUnit.SECONDS);
assertEquals(“011585696df1c30d7bb03ac5e655550a”, encodeHexString(digest.digest()));
assertTrue(“Test execution time exceeded: " + elapsed + " seconds”, elapsed < MAX_TEST_EXECUTION_MS);
}

protected void explode(PSTFolder folder, MessageDigest digest) throws Exception {
if (folder.getContentCount() > 0) {
PSTMessage email = (PSTMessage) folder.getNextChild();
while (email != null) {
String subject = email.getSubject();
String body = email.getBody();
if (subject != null) digest.update(subject.trim().getBytes());
if (body != null) digest.update(body.trim().getBytes());
email = (PSTMessage) folder.getNextChild();
}
}
for (PSTFolder subFolder : folder.getSubFolders()) {
explode(subFolder, digest);
}
}

Hi Mike,


Thank you for posting your inquiry.

We are facing some issue while downloading the sample PST file. However, we have tested this issue with some sample PST files at our end. We have noticed that the time taken by the LibPst API is lesser than the time taken by Aspose API. But with our sample PST files, there were exceptions thrown by LibPst API and the total messages extracted are less than those extracted by the Aspose.Email API. We are currently downloading your sample PST file and shall check it once its downloaded. We shall soon share our findings with you here.

If you have a FTP server or somewhere else I can upload the PST, I’d be happy to. Just let me know. The forum software errors when I try to attach it to this thread.

Hi Mike,


I have downloaded the PST file and have tested it with Aspose.Email for Java 5.5.0 and the java-libpst.0.7. With Aspose.Email, it took about 107 seconds and it took 19 seconds with the libpst. However, I got lot of exceptions while testing with libpst. Could you please confirm if you get any exception while using libpst? If no, please provide us your libpst.jar file for our testing here. It will help us to observe the issue and provide assistance accordingly.

The project is using java-libpst version 0.8.1. If you’re using maven, the dependency should resolve from the pom. If not, you can download it from the central maven repo:


http://search.maven.org/remotecontent?filepath=com/pff/java-libpst/0.8.1/java-libpst-0.8.1.jar

I have not encountered any exceptions in my lib-pst tests.

Hello Mike,

MapiMessage is quite a heavy class and is determined to work with msg format. If in your case you need to just read the subject and some properties, you can use properties of MessageInfo and PersonalStorage.extractProperty method.

Code example for your case:

@SuppressWarnings("UnusedDeclaration")
protected void explode(PersonalStorage pst, FolderInfo folder, MessageDigest digest) { for (MessageInfo info : folder.enumerateMessages()) { String subject = info.getSubject(); String body = pst.extractProperty(info.getEntryId(), MapiPropertyTag.PR_BODY_W).getString(); if (subject != null) digest.update(subject.trim().getBytes()); if (body != null) digest.update(body.trim().getBytes()); } for (FolderInfo subFolder : folder.enumerateFolders()) { explode(pst, subFolder, digest); } }

Thanks.

Thanks for the advice Dmitry. Without the extra overhead, the tests complete in comparable time.

Hi Mike,

Its good to know that your requirement is solved now using the code shared by Dmitry. Please feel free to write to us if you have any further query in this regard.