I need to extract plain text (not just the body, but all possible text data) from MSG and EML files; with the most efficient way possible. The extracted text will be used for indexing - so no formatting etc. required - just plain text.
Can you please advise on what is the best way (in terms of reliability and performance) to do this?
Ok, I have visited all attirbutes and created a merged text. That will do the job for me; but for some emails, I have the bodyencoding field such as UTF8 and if I get bodytext directly, some characters are lost/incorrect. I do know the correct encoding from body encoding; but how will I use it? How can I get the body text with correct encoding, so the text is totally correct?