Free Support Forum -

Extracting Text From MSG and EML


I need to extract plain text (not just the body, but all possible text data) from MSG and EML files; with the most efficient way possible. The extracted text will be used for indexing - so no formatting etc. required - just plain text.

Can you please advise on what is the best way (in terms of reliability and performance) to do this?


Ok, I have visited all attirbutes and created a merged text. That will do the job for me; but for some emails, I have the bodyencoding field such as UTF8 and if I get bodytext directly, some characters are lost/incorrect. I do know the correct encoding from body encoding; but how will I use it? How can I get the body text with correct encoding, so the text is totally correct?

Please advise.


Hi Alp,

Sorry for a delayed response.

Generally, you can apply any encoding to an array of bytes the following way:

Encoding utf8 = Encoding.UTF8;<o:p></o:p>

string strText = utf8.GetString(BytesData);

where BytesData represents the bytes representation of text/string.

Can you please provide us with such a sample message file where the characters are lost/incorrect. We’ll also look into it for our reference.