Handling of duplicate Headers in EML files

The EDRM DupeID project requires that if an email has multiple “Message-ID” headers, only the first one should be used.

Some experiments indicate that Aspose.Email (.NET) makes only the last value available, and there doesn’t seem to be a way to know whether an email had multiple Message-ID headers or what the other values were. Is this correct?

Thanks.

@phaselden

You’re seeing only one Message‑ID value and want to know whether Aspose.Email can expose all duplicates.

To investigate this, could you provide:

  • The exact Aspose.Email for .NET version (NuGet package version or build number).
  • The code you use to load the EML file (e.g., MailMessage.Load, LoadFromFile, a stream, etc.) and the way you read the Message‑ID header.
  • A minimal snippet that reproduces the behavior where only the last Message‑ID is returned.
  • Whether you have inspected the full header collection (MailMessage.Headers) to see if the duplicate entries are present there.

With these details I can reproduce the scenario and determine if duplicate Message‑ID headers are accessible.

The .eml file in question has the following 2 headers

Message-ID: CY1SCH030030241B5DE2FB98B19B4ABBD448A090@phx.gbl
Message-ID: ZTGQCAO3UWT4.0W5YSS2O0KVU1@CY1SCH030030241

All of the following approaches only return the 2nd value:

using var msg = MailMessage.Load(“My Gmail 2021-07-03-22116.eml”);

var val1 = msg.Headers[“Message-ID”];
var val2 = msg.Headers.GetValues(“Message-ID”);
var val3 = msg.Headers.GetDecodedValue(“Message-ID”)

Yes, I have inspected the Headers collection and I cannot see the 2 values.

@phaselden

You’re seeing only the second Message‑ID because the MailMessage.Headers collection returns a single value for each header name; when a header appears more than once the later occurrence overwrites the earlier one. Consequently msg.Headers["Message‑ID"], msg.Headers.GetValues("Message‑ID") and msg.Headers.GetDecodedValue("Message‑ID") all give the same (last) value, and the duplicate entry is not exposed through the collection.

What this means

  • The current Aspose.Email API does not provide a way to retrieve all occurrences of a repeated header such as multiple Message‑ID lines.
  • All three accessors you tried therefore behave identically.

Possible approach

If you need to see every header line, read the raw .eml file yourself (e.g., with File.ReadAllLines or a StreamReader) and parse the header section before loading it with MailMessage.Load. This lets you capture all duplicate headers, after which you can continue to use Aspose.Email for the rest of the processing.

If you can share the exact Aspose.Email for .NET version you are using, we can verify whether this behavior is consistent across versions.

This is with Aspose.Email 26.1.0.

I am pretty sure the behaviour changed at some point. I believe that a few years ago the following code used to return multiple values as a single comma-separated string. eg. “value1,value2”

var msgid = msg.MessageId;

That at least gave us the option to detect a scenario where there were multiple values without resorting to external/additional processing.

I realise that EML files that contain duplicate headers are technically malformed, but it is good to be able to read them and not just get an error when trying to load them, so thanks for that.

99.999% (estimated) of emails don’t have this issue, which is another reason I would prefer not to do additional file reads and processing just to cater for this rare scenario.

Ideally, I would like to see a change in Aspose.Email to be able to deal with this scenario. There are a number of options how this could be approached. For example, a couple of approaches could be:

  • Have a setting in LoadOptions called DuplicateHeaderStrategy, with possible options something like UseLast (default) | UseFirst | CombineValues | ReturnError.
  • Have a new property on the MailMessage named IsMalformed, which we can inspect to see whether we need to manually parse the file ourselves.

Hello @phaselden,

Thank you for reporting this. We’ll review the current behavior and evaluate the suggested approaches as part of our investigation.

Hi @phaselden

We have fixed this behavior and starting with the next version of Aspose.Email, if an email contains multiple “Message-ID” headers, only the first one will be used.

Excellent. Thanks very much.

@phaselden,

You are welcome.