French character é replaced with '?' after converting MSG to MHT/HTML

Hi team,
We use 19.11, 20.1 and 20.2 AsposeEmail.dll.
Step to reproduce:
Try to receive message (e.g Subj é.zip (12.9 KB)
) with French character é in the message body…

We use this code to include embedded resource to HTML:

using (MemoryStream ms = new MemoryStream())
{
HtmlSaveOptions htmlSaveOption = new HtmlSaveOptions();

htmlSaveOption.HtmlFormatOptions |= HtmlFormatOptions.WriteHeader |
HtmlFormatOptions.WriteCompleteEmailAddress |
HtmlFormatOptions.WriteCompleteCcEmailAddress;

htmlSaveOption.CheckBodyContentEncoding = true;
htmlSaveOption.EmbedResources = true;
msg.Save(ms, htmlSaveOption);

string html = Encoding.ASCII.GetString(ms.ToArray());
}

Email Msg.BodyHtml has a body like show below:
image.png (5.2 KB)

html string after encoding operation has a body like:
image.png (5.2 KB)

Could you clarify what’s the problem.
Thank you

Regards,
Dmitry

@cap.aspose,

I have worked with MSG file shared by you using Aspose.Email for .NET 20.2. I have used following code and it produced correct output.

    public static void TestMHTMLExport()
    {

        MsgLoadOptions emlLoadOptions = new MsgLoadOptions();
        emlLoadOptions.PrefferedTextEncoding = Encoding.GetEncoding("iso-8859-1");

        String path = @"C:\Aspose Data\Subj\"; ;
        MailMessage eml = MailMessage.Load(path + "subj.msg", emlLoadOptions);


        using (MemoryStream ms = new MemoryStream())
        {
            HtmlSaveOptions htmlSaveOption = new HtmlSaveOptions();

            htmlSaveOption.HtmlFormatOptions |= HtmlFormatOptions.WriteHeader |
            HtmlFormatOptions.WriteCompleteEmailAddress |
            HtmlFormatOptions.WriteCompleteCcEmailAddress;

            htmlSaveOption.CheckBodyContentEncoding = true;
            htmlSaveOption.EmbedResources = true;
            
            eml.Save(path + "Subj2.MHTML", EmlSaveOptions.DefaultMhtml);
            eml.Save(ms, EmlSaveOptions.DefaultMhtml);
            string html = Encoding.ASCII.GetString(ms.ToArray());
        }

    }

Subj.zip (4.9 KB)

Hi @mudassir.fayyaz,
I requested that you try receiving this message… not load from file…

You set TextEncoding manually before loading email from file…

Try to forward and receive email with another character set it and the behavior will be different …
So how could I get the right character set from message?
As I understand I can use message.BodyEncoding.WindowsCodePage to get the right result… is that what it is?:

message.HtmlBody = Encoding.GetEncoding(message.BodyEncoding.WindowsCodePage).GetString(ms.ToArray());

Second could you clarify if there’s any other way to include embedded resources to HTML?

Thanks

@cap.aspose,

I have observed your comments and like to share that even if I comment out the following line in my provided sample code the rendering is done right. There is no issue with French character.

Try to load this email Sushma_test.zip (2.8 KB)
from file. When you use MapiMessage class you get wrong html presentation like
image.png (40.5 KB)
When you use MailMessage class and load the same email you get wrong html presentation in case you commented encoding line:
image.png (40.3 KB)

but with hard coded text encoding load option your rendering looks right:
image.png (40.0 KB)

So try to remove these lines and receive email from network you get wrong view:

Could you answer my previous questions:

  1. Does message.BodyEncoding.WindowsCodePage contain a message character set?
  2. Could you clarify if there’s any other way to include embedded resources to HTML?

and

  1. Why MapiMessage and MailMessage have not similar behavior while loading email to stream?
  2. Where and how can I get right character set to set it as content encoding?

@

Yes, if you don’t set explicit encoding setting emlLoadOptions.PrefferedTextEncoding = Encoding.GetEncoding(“iso-8859-1”);
If you set explicit encoding then this setting contains value which you set.

Exsists 2 ways to embed resources

  1. htmlSaveOption.EmbedResources = true; - then attachments and resources will be embed as base64 string in html
  2. htmlSaveOption.EmbedResources = false; - then you can use SaveResourceHandler to save resources in separately folder as described here:
    https://github.com/aspose-email/Aspose.Email-for-.NET/blob/master/Examples/CSharp/Email/SaveMessageAsHTML.cs

We have not understood this. Can you please share the clarification for this and perhaps in this form of example.

Not needed, it will be automatically set by Aspose.Email API. No need to use an explicit encoding setting:

emlLoadOptions.PrefferedTextEncoding = Encoding.GetEncoding("iso-8859-1");

Instead you can correctly convert from bytes to string:
Encoding enc = message.BodyEncoding ?? message.PreferredTextEncoding;
string html = enc.GetString(ms.ToArray());

It means embedded resources cannot be included without saving message to HTML file or steam unfortunately…

See my previous comment please. I uploaded 1 email and described 3 working case with MapiMessage and MailMessage (see attached screenshots)

@cap.aspose,

We have investigated the requirements further on our end.

Aspose.Email API allows you to embed resources when saving messages to a HTML file or stream. If you do not want to save the message to a file or stream, then where do you want to embed these resources? Can you please elaborate.

In the following code in comments we he shared explanations concerning different behavior of MailMessage and MapiMessage , and correct getting string from bytes.

    HtmlSaveOptions htmlSaveOption = new HtmlSaveOptions();
    htmlSaveOption.HtmlFormatOptions |= HtmlFormatOptions.WriteHeader |
    HtmlFormatOptions.WriteCompleteEmailAddress |
    HtmlFormatOptions.WriteCompleteCcEmailAddress;
    htmlSaveOption.CheckBodyContentEncoding = true;
    htmlSaveOption.EmbedResources = true;

    MsgLoadOptions emlLoadOptions = new MsgLoadOptions();

    MailMessage eml = MailMessage.Load(fileName, emlLoadOptions);
    MemoryStream msEml = new MemoryStream();

    eml.Save(msEml, htmlSaveOption);
    //incorrect, saved as UTF8, gets by ASCII
    string html1 = Encoding.ASCII.GetString(msEml.ToArray());
    //incorrect saved by UTF8, got by UTF16
    string html2 = Encoding.GetEncoding(eml.BodyEncoding.WindowsCodePage).GetString(msEml.ToArray());
    //correct because encoding choosed correctly 
    string html3 = eml.BodyEncoding.GetString(msEml.ToArray());

    emlLoadOptions.PrefferedTextEncoding = Encoding.GetEncoding("iso-8859-1");

    MailMessage eml2 = MailMessage.Load(fileName, emlLoadOptions);
    MemoryStream msEml2 = new MemoryStream();

    eml2.Save(msEml2, htmlSaveOption);
    //all strings are correct because encoding set explicit and character is suitable 
    string html11 = Encoding.ASCII.GetString(msEml2.ToArray());
    string html22 = Encoding.GetEncoding(eml2.BodyEncoding.WindowsCodePage).GetString(msEml2.ToArray());
    string html33 = eml2.BodyEncoding.GetString(msEml2.ToArray());

    MapiMessage msg = MapiMessage.FromFile(fileName);
    MemoryStream msMsg = new MemoryStream();
    //encoding "iso-8859-1" setting not applied because MapiMessage.FromFile method does not contain emlLoadOptions as parameter
    msg.Save(msMsg, htmlSaveOption);
    ///incorrect, the similar behaviour as eml without explicit encoding setting
    string html111 = Encoding.ASCII.GetString(msMsg.ToArray());
    string html222 = Encoding.GetEncoding(eml.BodyEncoding.WindowsCodePage).GetString(msEml.ToArray());
    //correct because encoding choosed correctly
    string html333 = Encoding.GetEncoding(msg.CodePage).GetString(msMsg.ToArray());