Duplicate disclaimer text in PR_HTML when reading MSG file

Hello Aspose Support Team,

I am using Aspose.Email to read .msg files. I noticed that in some emails, the disclaimer text (inserted by our mail server) appears twice in the HtmlBody.

When I checked the raw MAPI property PR_HTML (via MFCMAPI), I found that the duplicate text is actually present inside MSO conditional comments (<!--[if mso]> ... <![endif]-->) as well as in the normal HTML body.

For example:

<!--[if mso]>
<p style="color:red;">Security Disclaimer</p>
<![endif]-->

<p style="color:red;">Security Disclaimer</p>

So, when I use Aspose to read the email, I also see the duplicated disclaimer.

Here is the code I used:

string body = mailMessage
    .GetHtmlBodyText(false)
    .Replace("<", "&lt;")
    .Replace(">", "&gt;")
    .Replace("\n", "<br />")
    .Replace("\r", "");

My questions are:

  1. Does Aspose.Email provide any built-in functionality to normalize or clean up the HtmlBody, for example by removing MSO conditional comments?
  2. If not, what is the recommended approach to handle this scenario? Should I post-process the HTML myself (e.g., with regex or an HTML parser), or does Aspose have a utility for this?
  3. Is this considered expected behavior (Aspose returning raw PR_HTML), or should I report it as a bug/feature request?

Thank you for your guidance.

Best regards,
Nghia

@nguyen.xuan.nghia

Hello Nghia,

  1. Aspose.Email does not provide built-in functionality specifically for normalizing or cleaning up the HTML body, such as removing MSO conditional comments.

  2. The recommended approach to handle this scenario is to post-process the HTML yourself. You can use a regular expression or an HTML parser to remove the MSO conditional comments. Here’s a simple example using regex in C#:

    string cleanedBody = Regex.Replace(body, @"<!--\[if mso\].*?<!\[endif\]-->", string.Empty, RegexOptions.Singleline);
    

    This regex will effectively match and remove the MSO conditional comments from the HTML body.

  3. This behavior is expected since Aspose.Email returns the raw PR_HTML as it is. If you believe this could be improved, consider submitting a feature request to enhance the handling of such cases.

Best regards!

Hi Support Team,

Thank you very much for your feedback.
I understand your suggested solution, but I still have a problem:

I am using the following code to get the body:

body = mailMessage.GetHtmlBodyText(false)
    .Replace("<", "&lt;")
    .Replace(">", "&gt;")
    .Replace("\n", "<br />")
    .Replace("\r", "");

When I call mailMessage.GetHtmlBodyText(false), the returned body already contains the disclaimer text duplicated.

My question is:
Is there a way to remove the parts <!--[if mso] ... <![endif]--> before mailMessage.GetHtmlBodyText(false) parses the content?

The reason I am asking is that I really want to continue using mailMessage.GetHtmlBodyText(false) because it already fits our requirements. If I change the logic to extract the body differently, there is a risk that the output will not match the current behavior of GetHtmlBodyText(false).

Do you have any solution or recommended approach for this?

I hope this makes sense, and I look forward to your guidance.

Best regards,
Nghia

Hello @nguyen.xuan.nghia,

Thank you for clarifying your scenario.

Currently, Aspose.Email doesn’t provide a built‑in way to remove <!--[if mso]> ... <![endif]--> conditional comments before mailMessage.GetHtmlBodyText(false) processes the content. This method works directly on the raw PR_HTML property, so MSO conditional comments are included in the result.

The recommended approach is to clean the HTML body before calling GetHtmlBodyText(false). For example:

// Remove MSO conditional comments from the HTML body
mailMessage.HtmlBody = Regex.Replace(
    mailMessage.HtmlBody,
    @"<!--\[if mso\].*?<!\[endif\]-->",
    string.Empty,
    RegexOptions.Singleline);

// Now call GetHtmlBodyText(false) as usual
string body = mailMessage.GetHtmlBodyText(false)
    .Replace("<", "&lt;")
    .Replace(">", "&gt;")
    .Replace("\n", "<br />")
    .Replace("\r", "");

This way, you preserve the behavior of GetHtmlBodyText(false) while removing duplicate disclaimers caused by MSO conditional blocks.

Please note this behavior is expected. If you consider this a common scenario, you may also submit a request so our team can evaluate adding an option to filter out MSO conditional comments directly within the lib.

Hi Support Team,

Thank you very much for your response. It really helped me solve my problem.

Best regards,
Nghia

You are welcome!