Parsing the details via DotNet in efficient manner

Hello Team, we are out here to parse and fetch the data for PST & OLM Files in most time consuming and efficient manner, and we have 2 goals here. Both the parsing logics are consuming too much time and we want to optimise it can you help with it?

  1. Mail List Fetching from one folder.
  2. Mail Detail Fetching for one mail.

We need the optimal code for it in Aspose.Email for DotNet.

  1. Mail List Fetching Using Folder Id
public static string GetMailListAsJson(string emailFilePath, string folderEntryId)
    {
        var pst = PersonalStorage.FromFile(emailFilePath);
        var targetFolder = pst.GetFolderById(folderEntryId);

        var mailDetailsList = new List<MailListInfo>();

        foreach (var messageInfo in targetFolder.EnumerateMessages())
        {
            try
            {
                MapiMessage msg = pst.ExtractMessage(messageInfo.EntryId);
                var mailDetail = new MailListInfo
                {
                    MessageId = messageInfo.EntryIdString,
                    Subject = msg.Subject ?? "No Subject",
                    Sender = msg.SenderEmailAddress ?? msg.SenderName ?? "Unknown Sender",
                    Content = msg.Body.Substring(0, 100) + "..." ?? "No Content";
                    HasAttachments = msg.Attachments?.Count > 0
                };

                mailDetailsList.Add(mailDetail);
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing message: {ex.Message}");
            }
        }
    }
  1. Mail Detail Fetching for a single mail using message id
private static MailDetailInfo ProcessFolderForMessage(PersonalStorage pst, FolderInfo folder, string messageId)
    {
        foreach (var messageInfo in folder.EnumerateMessages())
        {
            try
            {
                MapiMessage msg = pst.ExtractMessage(messageId);

                return new MailDetailInfo
                {
                    Subject = msg.Subject ?? "No Subject",
                    Sender = msg.SenderEmailAddress ?? "Unknown Sender",
                    ContentHtml = msg.BodyHtml ?? msg.Body ?? "No Content",
                    Recipients = GetRecipients(msg),
                    Attachments = GetAttachments(msg)
                };
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing message: {ex.Message}");
            }
        }

        return null;
    }

    private static List<string> GetRecipients(MapiMessage msg)
    {
        var recipientsList = new List<string>();

        foreach (var recipient in msg.Recipients)
        {
            recipientsList.Add(recipient.EmailAddress);
        }

        return recipientsList;
    }

    private static List<string> GetAttachments(MapiMessage msg)
    {
        var attachmentsList = new List<string>();

        foreach (var attachment in msg.Attachments)
        {
            attachmentsList.Add(attachment.LongFileName ?? attachment.DisplayName);
        }

        return attachmentsList;
    }
}

@MaazHussain

To optimize the parsing and fetching of data from PST and OLM files using Aspose.Email for .NET, you can follow the provided code examples with some enhancements for efficiency. Below are the optimized methods for fetching a list of emails from a specific folder and retrieving details for a single email.

1. Fetching Mail List Using Folder ID

You can improve the performance of fetching mail details by minimizing the number of times you access the MapiMessage object and by using a more efficient way to handle exceptions. Here’s an optimized version of your method:

public static string GetMailListAsJson(string emailFilePath, string folderEntryId)
{
    var pst = PersonalStorage.FromFile(emailFilePath);
    var targetFolder = pst.GetFolderById(folderEntryId);
    var mailDetailsList = new List<MailListInfo>();

    // Use a single try-catch block for the entire enumeration
    try
    {
        foreach (var messageInfo in targetFolder.EnumerateMessages())
        {
            MapiMessage msg = pst.ExtractMessage(messageInfo.EntryId);
            var mailDetail = new MailListInfo
            {
                MessageId = messageInfo.EntryIdString,
                Subject = msg.Subject ?? "No Subject",
                Sender = msg.SenderEmailAddress ?? msg.SenderName ?? "Unknown Sender",
                Content = msg.Body.Length > 100 ? msg.Body.Substring(0, 100) + "..." : msg.Body,
                HasAttachments = msg.Attachments?.Count > 0
            };

            mailDetailsList.Add(mailDetail);
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing messages: {ex.Message}");
    }

    // Convert mailDetailsList to JSON (you may need to use a library like Newtonsoft.Json)
    return JsonConvert.SerializeObject(mailDetailsList);
}

2. Fetching Mail Details for a Single Mail

For fetching details of a single email, you can directly extract the message using the message ID without iterating through the folder. This will significantly reduce the time taken for this operation:

private static MailDetailInfo GetMailDetail(PersonalStorage pst, string messageId)
{
    try
    {
        MapiMessage msg = pst.ExtractMessage(messageId);
        return new MailDetailInfo
        {
            Subject = msg.Subject ?? "No Subject",
            Sender = msg.SenderEmailAddress ?? "Unknown Sender",
            ContentHtml = msg.BodyHtml ?? msg.Body ?? "No Content",
            Recipients = GetRecipients(msg),
            Attachments = GetAttachments(msg)
        };
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing message: {ex.Message}");
        return null;
    }
}

private static List<string> GetRecipients(MapiMessage msg)
{
    return msg.Recipients.Select(recipient => recipient.EmailAddress).ToList();
}

private static List<string> GetAttachments(MapiMessage msg)
{
    return msg.Attachments.Select(attachment => attachment.LongFileName ?? attachment.DisplayName).ToList();
}

Summary of Optimizations

  • Single Exception Handling: Instead of handling exceptions within the loop, handle them once after the loop to reduce overhead.
  • Direct Access for Single Mail: For fetching details of a single mail, directly extract the message using its ID instead of iterating through the folder.
  • LINQ for Collections: Use LINQ to simplify the collection of recipients and attachments.

These optimizations should help in reducing the time taken for parsing and fetching data from PST and OLM files.

If you need further assistance or have specific performance metrics, feel free to ask!

Sources

The above said change still takes nearly 2 seconds to parse 50 mails response, can you give any other alternative as well?

We did try to fetch with MapiPropertyCollection instead of EnumerateMessages and that did reduce it to 200 millisecond, but it does not have all the properties codes (PR_SENDER_EMAIL_ADDRESS, PR_HASATTACH etc.)

Hello @MaazHussain,

To optimize your file parsing, we recommend using the EnumerateMessagesEntryId() method. This approach retrieves only the message entry IDs first, reducing unnecessary processing, and then extracts messages as needed:

public static string GetMailListAsJson(string emailFilePath, string folderEntryId)
{
    var pst = PersonalStorage.FromFile(emailFilePath);
    var targetFolder = pst.GetFolderById(folderEntryId);

    var mailDetailsList = new List<MailListInfo>();

    foreach (var entryId in targetFolder.EnumerateMessagesEntryId())
    {
        try
        {
            var msg = pst.ExtractMessage(entryId);
            var mailDetail = new MailListInfo
            {
                MessageId = entryId.ToString(),
                Subject = msg.Subject ?? "No Subject",
                Sender = msg.SenderEmailAddress ?? msg.SenderName ?? "Unknown Sender",
                Content = (msg.Body?.Length > 100 ? msg.Body.Substring(0, 100) + "..." : msg.Body) ?? "No Content",
                HasAttachments = msg.Attachments?.Count > 0
            };

            mailDetailsList.Add(mailDetail);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing message: {ex.Message}");
        }
    }

    return JsonConvert.SerializeObject(mailDetailsList);
}

private static MailDetailInfo ProcessFolderForMessage(PersonalStorage pst, FolderInfo folder, string messageId)
{
    foreach (var entryId in folder.EnumerateMessagesEntryId())
    {
        try
        {
            var msg = pst.ExtractMessage(entryId);

            return new MailDetailInfo
            {
                Subject = msg.Subject ?? "No Subject",
                Sender = msg.SenderEmailAddress ?? "Unknown Sender",
                ContentHtml = msg.BodyHtml ?? msg.Body ?? "No Content",
                Recipients = GetRecipients(msg),
                Attachments = GetAttachments(msg)
            };
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing message: {ex.Message}");
        }
    }

    return null;
}

Alternatively, you can use EnumerateMapiMessages(), which eliminates the need for separate message extraction:

public static string GetMailListAsJson(string emailFilePath, string folderEntryId)
{
    var pst = PersonalStorage.FromFile(emailFilePath);
    var targetFolder = pst.GetFolderById(folderEntryId);

    var mailDetailsList = new List<MailListInfo>();

    foreach (var msg in targetFolder.EnumerateMapiMessages())
    {
        try
        {
            var mailDetail = new MailListInfo
            {
                MessageId = msg.EntryIdString,
                Subject = msg.Subject ?? "No Subject",
                Sender = msg.SenderEmailAddress ?? msg.SenderName ?? "Unknown Sender",
                Content = (msg.Body?.Length > 100 ? msg.Body.Substring(0, 100) + "..." : msg.Body) ?? "No Content",
                HasAttachments = msg.Attachments?.Count > 0
            };

            mailDetailsList.Add(mailDetail);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing message: {ex.Message}");
        }
    }

    return JsonConvert.SerializeObject(mailDetailsList);
}

private static MailDetailInfo ProcessFolderForMessage(PersonalStorage pst, FolderInfo folder, string messageId)
{
    foreach (var msg in folder.EnumerateMapiMessages())
    {
        try
        {
            return new MailDetailInfo
            {
                Subject = msg.Subject ?? "No Subject",
                Sender = msg.SenderEmailAddress ?? "Unknown Sender",
                ContentHtml = msg.BodyHtml ?? msg.Body ?? "No Content",
                Recipients = GetRecipients(msg),
                Attachments = GetAttachments(msg)
            };
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing message: {ex.Message}");
        }
    }

    return null;
}

Additionally, please review our article on working with large PST files.

It’s important to note that performance largely depends on the structure of your messages, such as the size of the HTML body, the presence and size of attachments, and other factors. These optimizations help minimize unnecessary processing and improve efficiency.

Hi @margarita.samodurova I checked the logs and found that

Content = (msg.Body?.Length > 100 ? msg.Body.Substring(0, 100) + "..." : msg.Body) ?? "No Content",

Extracting the message body is the only part which is consuming more time for me, can you help me with any other alternative to fetch

  • Message Body
  • Message Body HTML

@MaazHussain,

Unfortunately, there are no alternative methods for faster extraction of Body / HtmlBody.
Try to use [..] slicing:

string content = msg.BodyType == BodyContentType.PlainText ? msg.Body : msg.BodyHtml;
content = content is { Length: > 100 } ? $"{content[..100]}..." : content ?? "No Content";

This improves readability and may be slightly more performant.

And are you using a trial or a licensed version?
The trial version may work slightly slower because the message body is modified to insert a watermark text.

Hi @margarita.samodurova I have a few queries

  1. Do we support searching mails across folders?
  2. We noticed that while accessing the block of code for the first time it takes some time but when it is accessed again it returns the data instantly irrespective of the file path, is this expected?
    eg: We load
    PersonalStorage.FromFile(emailFilePath);
    Takes 100 milliseconds
    PersonalStorage.FromFile(emailFilePath); (or) PersonalStorage.FromFile(emailFilePath2);
    Takes 0 milliseconds

Can you explain why this is occurring technically?

Hello @MaazHussain,

  1. Yes, Aspose.Email allows searching emails across folders in a PST file.
    You can iterate through folders using MapiQueryBuilder to filter messages based on criteria.

  2. The behavior you observed is not related to Aspose.Email but is likely due to OS or development environment mechanisms.
    It could be some form of caching, such as the operating system’s file system cache, which speeds up subsequent file accesses.

Hi @margarita.samodurova, you have mentioned that we could search across folders, by this do we mean i could give a MailQuery like this RootFolder.EnumerateMessages(query) and i get an consolidated search results of all the subfolders or do we need to manually iterate through each subfolder to get the consolidated search results ?

Hello @Devishree,

You need to iterate through each subfolder to get consolidated search results. RootFolder.EnumerateMessages(query) will only return messages from the specified folder. If you want to search across all subfolders, you need to recursively iterate through each subfolder.

Thank you.

Hi @margarita.samodurova, Do we restrict parsing password protected pst files ?

Hello @Devishree,

Password protection in PST files is essentially an Outlook-specific feature, and the data itself is not encrypted. This allows to extract emails without requiring the password.
You can find more details about working with password-protected PST files on our documentation page.

Hi @margarita.samodurova thanks for the clarification, do we have any method to get list of MessageInfo based on the given list of entity Id ? also can i get list of messages present in the current folder based on the conversation id?

Hello @Devishree,

Aspose.Email does not provide a direct method to retrieve a list of MessageInfo objects based on a given list of entity IDs. However, you can achieve this by iterating over the folder’s message collection and filtering messages based on their IDs.
You can iterate through messages in a folder and filter by entity ID:

FolderInfo folderInfo = pst.RootFolder.GetSubFolder("Inbox");
MessageInfoCollection messages = folderInfo.GetContents();

List<string> entityIds = new List<string> { "id1", "id2" }; // Replace with actual IDs
List<MessageInfo> filteredMessages = messages.Where(m => entityIds.Contains(m.EntryIdString)).ToList();

For retrieving messages in the current folder based on a conversation ID, you can refer to our blog article Group Messages from PST by Conversation Threads using C# .NET. This article explains how to use MAPI properties such as PidTagConversationIndex to identify and group messages into conversations.
Additionally, you can check out the ConversationThread sample app in our GitHub repository. This project provides a practical implementation of grouping messages by conversation.

Hi @margarita.samodurova for folder parsing i need to extract mail related folders alone to achieve this can i rely on the ContainerClass property of the PersonalStorage object ? If no how to achieve this?

Hello @Devishree,

Yes, you can rely on the ContainerClass property of the FolderInfo object. Mail-related folders typically have the ContainerClass set to "IPF.Note". You can use this value to filter out mail folders when parsing the PST.

But i could see empty values for certain folders is this expected?
Image File.png (4.9 KB)

Yes, some folders can have an empty ContainerClass value.

Usually, an empty ContainerClass implies “IPF.Note”, meaning the folder is likely a mail folder. However, for greater reliability, you can extract one item from the folder and check its message class (MapiMessage.MessageClass). If it is “IPM.Note”, then the folder contains mail items.

Hi @margarita.samodurova how do i differentiate inline images and attachment in MapiAttachment, i could notice for a specific file IsInlineImage property always return false even for inline image

Hello @Devishree,

Could you please provide a sample MSG file where an inline image exists, but the IsInline property returns false? This would help us analyze the issue.