I think the bottom line is still this:
If we can access MapiPropertyTag.PR_HTML, I believe there is no problem at all, because the characters in that property are already correct and contain the escaped versions of characters (’ and — and all the others found here: http://tntluoma.com/sidebars/codes/
). I don’t know why we can’t access that property directly like we can access most of the other properties.
Please look carefully at MapiMessage.BodyHTML characters. If you copy and paste the value out to text file, you will find this text line inside “Team – This one has a minor change, it wasn’t updated in the future”. That is bad, because if you write that out with standard encoding it will come out as “Team – This one has a minor change, it wasn’t updated in the future”. Now…
Please look carefully in OutlookSpy at PR_HTML property. If you copy and paste the value out, you will find this text line inside “Team — This one has a minor change, it wasn’t updated in the future”. This is good, because you can write it out in standard encoding and it will look like correct.
So, PR_HTML characters are properly escaped. MapiMessage.BodyHtml characters are not escaped properly. This data has been changed when someone copied the data from PR_HTML to BodyHtml; it has lost the escaping.
Here is some simple code to generate the problem:
// load message that I provided
MapiMessage msg = MapiMessage.FromFile(filepath);
// write msg.BodyHtml out to a text file, and then view that .html file. you will see bad characters
//using (TextWriter writer = new StreamWriter(“test.html”, false))
using (StreamWriter writer = new StreamWriter(“test.html”, false))
{
writer.Write(msg.BodyHtml);
}
// write msg.BodyHtml out to a file with Unicode encoding, and you will see it looks ok, because the Unicode encoding can handle the special characters when they are not escaped
using (FileStream writer = new FileStream(“test_unicode.html”, FileMode.Create, FileAccess.Write))
{
byte[] bytes = Encoding.Unicode.GetBytes(msg.BodyHtml);
writer.Write(bytes, 0, bytes.Length);
}
Attached you will find the two files that this code generates. We shouldn’t have to write out the entire BodyHtml in Unicode format (which takes more disk space). We should be able to write out the PR_HTML data in standard encoding with the escaped characters…