The sum of the calculated message sizes is larger than the size of the PST file that is created

We are using the aspose.email library to create a PST file from exchange messages that have been stored in our backup system. In the process of creating the PST, we are calculating the size of each message (code shown below) and then adding up the sizes to display the total amount of data that will be exported to our users. This is a problem because the size of the total amount of data displayed to the user is larger (by about 30%) as compared to the PST file size and users think that there is data loss.

Questions:

  1. Is the PST compressed in any fashion?
  2. Is the following method of computing the message size correct?

int extractMapiMessageSize(String entryId, PersonalStorage ps) {
int size = 0;
MapiAttachmentCollection coll = ps.extractAttachments(entryId);

for (MapiAttachment att : coll) {
    Long attachmentSize = att.getPropertyLong(MapiPropertyTag.PR_ATTACH_SIZE);
    if (attachmentSize != null) {
        size += attachmentSize;
    }
}

MapiMessage msg = ps.extractMessage(entryId);
Long msgSize = msg.getPropertyLong(MapiPropertyTag.PR_MESSAGE_SIZE);
if (msgSize != null) {
    size += msgSize;
}

return size;

}

@pguerzenichsmall,
Thank you for the issue description. I have logged it in our tracking system with ID EMAILJAVA-34858. Our development team will investigate the issue. I will inform you about any progress.

One more thing to add:
We are setting UseBodyCompression to true before we call MapiMessage.fromMailMessage

@pguerzenichsmall,
Could you share an updated comprehensive code example with the MapiMessage.fromMailMessage call, please?

Here is the method that is called. The InputStream is simply a message in .eml format:

MapiMessage toMapiMessage(InputStream inputStream) {
    MapiConversionOptions options = new MapiConversionOptions();
    options.setUseBodyCompression(true);

    MailMessage mailMessage = MailMessage.load(inputStream);
    MapiMessage mapiMessage = MapiMessage.fromMailMessage(mailMessage, options);

    // When converting from MailMessage to MapiMessage, the Aspose library makes all attachments inline.
    // To ensure that the message attachments are unchanged, remove the property tag that indicates it is inline
    // if the original attachment was not inline.
    for (int i = 0; i < mailMessage.getAttachments().size(); i +=1) {
        if (mailMessage.getAttachments().get_Item(i).isEmbeddedMessage()) {
            MapiAttachment attachment = mapiMessage.getAttachments().get(i);
            attachment.removeProperty(MapiPropertyTag.PR_ATTACH_CONTENT_ID);
        }
    }

    return mapiMessage;
}

@pguerzenichsmall,
Unfortunately, I cannot reproduce the issue. Please share a complete code example generating a PST file and reproducing the issue. Also, please specify the version of Aspose.Email you are using.

Can you simply answer the following questions? They don’t require any source code or a PST. If you cannot, can you contact someone who has this knowledge?

1. When we turn on UseBodyCompression, how much compression will occur?

2. Is the following the correct way to calculate the size of a MapiMessage object?

   int extractMapiMessageSize(String entryId, PersonalStorage ps) {
       int size = 0;
       MapiAttachmentCollection coll = ps.extractAttachments(entryId);

       for (MapiAttachment att : coll) {
           Long attachmentSize = att.getPropertyLong(MapiPropertyTag.PR_ATTACH_SIZE);
           if (attachmentSize != null) {
               size += attachmentSize;
           }
       }

      MapiMessage msg = ps.extractMessage(entryId);
      Long msgSize = msg.getPropertyLong(MapiPropertyTag.PR_MESSAGE_SIZE);
      if (msgSize != null) {
          size += msgSize;
      }

      return size;
  }

@pguerzenichsmall,
Our development team will investigate your questions further. Thank you for your patience.

@pguerzenichsmall,
Our development team investigated your questions.

UseBodyCompression option is comparable in compression level to ZIP.

  • PST and MapiMessage are different data storage formats.
  • The PST format stores data in a more compact form compared to MapiMessage.
  • PR_MESSAGE_SIZE is the approximate size: PidTagMessageSize Canonical Property | Microsoft Learn [The message size indicates the approximate number of bytes that are transferred when the message is moved from one message store to another. Being the sum of the sizes of all properties on the message object, it is usually considerably greater than the message text alone.]
  • PR_MESSAGE_SIZE already contains the size of PR_ATTACH_SIZE and does not need to be added.
  • PR_MESSAGE_SIZE contains the size of the already compressed body.

If you need to make sure that there is no data loss, you can save all messages from the PST to EML format and compare with the original ones.

Thank you, Andrey. This is a great help.