Read message bodies of embedded messages without any changes

Suppose there is a message in a PST and that message has an embedded message as an attachment.

I want to read the message bodies of the embedded message exactly as they are in the PST for archiving purposes. I do not want to have any message bodies synthesized for it.

I am able to do this on stand-alone messages (i.e. not embedded messages) by using PersonalStorage.ExtractProperty to read message body properties, rather than loading the messages as MapiMessage objects.

However, I can’t find a way to read the bodies of an embedded message in a way that avoids message bodies being synthesized.

Is there a way to know if a message body has been synthesized from Aspose or a way to avoid having message bodies synthesized for embedded messages?

I realize that message body synthesis is intentional behavior but I’m looking for a way to work around it so that I can archive the message bodies exactly as they are.

Below is an Xunit test demonstrating that PersonalStorage.ExtractProperty allows messages bodies to be read while avoiding synthesized bodies on regular messages. I’m looking for a way to do something similar with embedded messages.

[Fact]
public void Aspose_synthesizes_bodies_when_only_pr_html_is_set()
{
     string html =    "My PR_HTML message body!";
    // create a message and only set the body with PR_HTML
    MapiMessage before = new(OutlookMessageFormat.Unicode);
    before.SetProperty(KnownPropertyList.TagHtml, Encoding.UTF8.GetBytes(html));

    // save it to a PST
    MemoryStream ms = new();
    using (PersonalStorage pst = PersonalStorage.Create(ms, FileFormatVersion.Unicode, true))
    {
        pst.RootFolder.AddMessage(before);
    }

    ms.Position = 0;

    // re-open the pst
    MapiMessage after;
    using (PersonalStorage pst = PersonalStorage.FromStream(ms, new PersonalStorageLoadOptions { LeaveStreamOpen = true }))
    {
        MessageInfo message = pst.RootFolder.EnumerateMessages().Single();

        // use PersonalStorage.ExtractProperty bodies. this seems to avoid message body synthesis.

        // PR_HTML is unchanged
        var tagHtml = pst.ExtractProperty(message.EntryId, KnownPropertyList.TagHtml.Tag);
        Assert.NotNull(tagHtml);
        Assert.Equal(html, Encoding.UTF8.GetString(tagHtml.Data));

        // **** NO SYNTHESIZED BODIES *****
        Assert.Null(pst.ExtractProperty(message.EntryId, KnownPropertyList.BodyHtml.Tag));
        Assert.Null(pst.ExtractProperty(message.EntryId, KnownPropertyList.Body.Tag));
        Assert.Null(pst.ExtractProperty(message.EntryId, KnownPropertyList.RtfCompressed.Tag));

        // extract the message as a MapiMessage instance
        after = pst.ExtractMessage(message);
    }

    // check that PR_HTML is still set
    var tagHtmlAfter = after.Properties[KnownPropertyList.TagHtml];
    Assert.NotNull(tagHtmlAfter);

    // PR_HTML is unchanged
    Assert.Equal(html, Encoding.UTF8.GetString(tagHtmlAfter.Data));
    Assert.Equal(BodyContentType.Html, after.BodyType);

    // PR_BODY_HTML is still null
    Assert.Null(after.Properties[KnownPropertyList.BodyHtml]);

    // ***** MapiMessage has synthesized bodies *****
    Assert.NotNull(after.Properties[KnownPropertyList.RtfCompressed]);
    Assert.NotNull(after.Properties[KnownPropertyList.Body]);
}

@weswes

To read the message bodies of embedded messages in a PST file without any synthesis, you can utilize the PersonalStorage.ExtractProperty method, similar to how you handle standalone messages. However, for embedded messages, you will need to first extract the embedded message as a MapiMessage object and then access its properties directly.

Cause

The need arises when you want to access the original content of embedded messages within a PST file without any alterations or synthesis.

Solution

Here’s a step-by-step approach to achieve this:

  1. Extract the Embedded Message: Use the ExtractMessage method to get the embedded message as a MapiMessage object.
  2. Access the Properties: Instead of using synthesized properties like Body or BodyHtml, directly access the PR_HTML property or any other relevant property that contains the original message body.

Code Snippet

Here’s a code snippet to illustrate this:

using Aspose.Email;
using Aspose.Email.Mapi;
using Aspose.Email.Storage.Pst;

// Load the PST file
using (PersonalStorage pst = PersonalStorage.FromFile("path_to_your_pst_file.pst"))
{
    // Access the folder containing the messages
    FolderInfo inbox = pst.RootFolder.GetSubFolder("Inbox");

    // Iterate through messages in the folder
    foreach (MessageInfo messageInfo in inbox.EnumerateMessages())
    {
        // Extract the message
        MapiMessage message = pst.ExtractMessage(messageInfo.EntryId);

        // Access the PR_HTML property directly
        var htmlProperty = message.Properties[KnownPropertyList.TagHtml];
        if (htmlProperty != null)
        {
            string originalHtmlBody = Encoding.UTF8.GetString(htmlProperty.Data);
            Console.WriteLine("Original HTML Body: " + originalHtmlBody);
        }
    }
}

Next Steps

If you need to handle multiple embedded messages, you can apply the same logic recursively. Ensure that you have the necessary error handling in place to manage any exceptions that may arise during the extraction process.

Closing Line

This approach guarantees that you access the original message body without any synthesis. If you have any further questions or need additional assistance, feel free to ask!

Accessing the properties directly does not avoid synthesized properties. If I create a message and give it only a PR_HTML body, when a MapiMessage is instantiated a PR_BODY and PR_RTF_COMPRESSED are automatically added to the Properties collection. I understand that this is intentional, but it would be great it there was a way to instantiate a MapiMessage in some sort of “forensic” mode that avoided synthesis of properties.

I can achieve this for top level messages by using PersonalStorage.ExtractProperty, but if I want to see only the original body properties for an embedded message, I am unable to use PersonalStorage.ExtractProperty.

@weswes,

I’ll forward your request to the developers so they can investigate the case in embedded messages and consider whether a “forensic” mode or another way to extract message bodies exactly as they are in the PST can be added.

1 Like

Hello @weswes,

We have created an internal ticket to track this request: the ability to extract raw properties from embedded messages.

Issue ID(s): EMAILNET-41635

The developers will review possible solutions, such as extending ExtractProperty for embedded messages.

Thank you.

1 Like