Email with links saved to plain text with link's source text (C# .NET)

cap.aspose · June 17, 2019, 1:58pm

Hello.
Old versions of Aspose.Email supported getting the plain text representation of message’s body without of hyperlink’s texts as you can see at the pictureCapture.PNG (66.7 KB)
.
I mean HTML view (MapiMessage.HtmlBody) can contain link, but MapiMessage.Body contain only text without text of source’s link.

But now MapiMessage.Body contains whole text representation like this Capture2.PNG (88.9 KB)

Could you clarify how can I get plain text view without source link’s description.
Thank you.

mudassir.fayyaz · June 17, 2019, 3:42pm

@cap.aspose,

I have observed the images shared by you and request you to please try using MapiMessage.BodyRtf on your end. In case there is still an issue, please share the source file reproducing the issue on your end.

cap.aspose · June 18, 2019, 7:47am

Hi
I can load message like this

MapiMessage msg = MapiMessage.FromFile(emailFile);

I put the MapiMessage.BodyRtf value to file MsgRtf.zip (3.6 KB)

mudassir.fayyaz · June 18, 2019, 2:02pm

@cap.aspose,

Can you please provide the source file with us that we may try verifying on our end to help you further.

cap.aspose · June 18, 2019, 2:21pm

You can try to load this message Message1.zip (19.4 KB)

mudassir.fayyaz · June 18, 2019, 5:37pm

@cap.aspose,

I suggest you to please try using following sample code.

    public static void ReadMessageBody()
    {
        String path = @"C:\Email\Message1\";
        MapiMessage msg = MapiMessage.FromFile(path+ "Message1.msg");
        if(msg.BodyType==BodyContentType.Html)
        {
            var body3 = msg.BodyHtml;
            StreamWriter sw = new StreamWriter(path + "output_out.html", false, Encoding.UTF8);

            //Writing data to HTML 
            sw.Write(body3);
            sw.Close();
           
        }
        else if(msg.BodyType == BodyContentType.PlainText)
        {
            var body2 = msg.Body;
        }
        else if (msg.BodyType == BodyContentType.Rtf)
        {
            var body = msg.BodyRtf;
        }

        
       
        
    }

cap.aspose · June 19, 2019, 11:59am

Ok. The result is this html file output_out.zip (2.4 KB)

mudassir.fayyaz · June 19, 2019, 12:42pm

@cap.aspose,

I hope the result is acceptable as it is what you have requested in your initial posts. It’s identical to body part in MSG.

cap.aspose · June 19, 2019, 2:22pm

Hi
As you see the result is HTML file. Not plain text.
I asked before how I can get plain text from message without link description. Aspose.Email get Message.Body as show below

To file your Periodic Report, go to the entity Summary https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sos.state.co.us_biz_emailEntitySummary.do-3FfileId-3D19871048569-26nameTyp-3DENT-26srchTyp-3DENTITY&d=DwMCAg&c=jI8JFiOO2jK3vEB7766npJVDWQg-IcmhbUBYAWqGg_Y&r=1JmUptcBjdLsPZl2wMg1hzcrGuXRv_BxAYDSiQBJcek&m=-kpC2cs3iLPg9xpz_E3s7PuCmY8H8xMt_fQTuQDRWZQ&s=jadymFt_CeB05sV-GJYmBQRaCOKolBKCiLRMP-6AGkE&e=

I would see Message.Body like this…

To file your Periodic Report, go to the entity Summary

The MapiMessage.Body used to include plain text only and without full link to resources.

Try this code please

            MapiMessage msg = MapiMessage.FromFile(emailFile);

                var body3 = msg.Body;
                String path = @"C:\Email\Message1\";
                StreamWriter sw = new StreamWriter(path + "output_out.txt", false, Encoding.UTF8);

                sw.Write(body3);
                sw.Close();

Your txt file will look like this with my email
image.png (38.9 KB)

But I want to see (it was in early version) something like this
image.png (14.5 KB)

mudassir.fayyaz · June 19, 2019, 8:21pm

@cap.aspose,

I have observed your requirements further and like to share that as per my previous code the content inside Body is of HTML type. Therefore, when you get body, you get complete HTML part of text. I suggest you to please try using following sample code to serve the purpose on your end. You may modify this as per your needs accordingly.

    public static string HTMLToText(string HTMLCode)
    {
        // Remove new lines since they are not visible in HTML
        HTMLCode = HTMLCode.Replace("\n", " ");

        // Remove tab spaces
        HTMLCode = HTMLCode.Replace("\t", " ");

        // Remove multiple white spaces from HTML
        HTMLCode = Regex.Replace(HTMLCode, "\\s+", " ");

        // Remove HEAD tag
        HTMLCode = Regex.Replace(HTMLCode, "<head.*?</head>", ""
                            , RegexOptions.IgnoreCase | RegexOptions.Singleline);

        // Remove any JavaScript
        HTMLCode = Regex.Replace(HTMLCode, "<script.*?</script>", ""
          , RegexOptions.IgnoreCase | RegexOptions.Singleline);

        // Replace special characters like &, <, >, " etc.
        StringBuilder sbHTML = new StringBuilder(HTMLCode);
        // Note: There are many more special characters, these are just
        // most common. You can add new characters in this arrays if needed
        string[] OldWords = {"&nbsp;", "&amp;", "&quot;", "&lt;",

“>”, “®”, “©”, “•”, “™”};
string[] NewWords = { " “, “&”, “””, “<”, “>”, “Â®”, “Â©”, “â€¢”, “â„¢” };
for (int i = 0; i < OldWords.Length; i++)
{
sbHTML.Replace(OldWords[i], NewWords[i]);
}

        // Check if there are line breaks (<br>) or paragraph (<p>)
        sbHTML.Replace("<br>", "\n<br>");
        sbHTML.Replace("<br ", "\n<br ");
        sbHTML.Replace("<p ", "\n<p ");

        // Finally, remove all HTML tags and return plain text
        return System.Text.RegularExpressions.Regex.Replace(
          sbHTML.ToString(), "<[^>]*>", "");
    }
    public static void ReadMessageBody()
    {
        String path = @"C:\Email\Message1\";
        MapiMessage msg = MapiMessage.FromFile(path+ "Message1.msg");
        if(msg.BodyType==BodyContentType.Html)
        {
            var body3 = msg.BodyHtml;
            String body4 = HTMLToText(body3);
            StreamWriter sw = new StreamWriter(path + "output_out.html", false, Encoding.UTF8);
            StreamWriter sw2 = new StreamWriter(path + "output_out.txt", false, Encoding.ASCII);

            //Writing Paragraphs data to HTML by providing paragraph starting index, total paragraphs to be copied
            sw.Write(body3);
            sw.Close();
            sw2.Write(body4);
            sw2.Close();
            int ss = 0;
        }
        else if(msg.BodyType == BodyContentType.PlainText)
        {
            var body2 = msg.Body;
        }
        else if (msg.BodyType == BodyContentType.Rtf)
        {
            var body = msg.BodyRtf;
        }

        
    }

cap.aspose · June 20, 2019, 5:46am

Hi
Do you suggest me a workaround?
Are you going to include that parsing into you lib with a next patch?

Adnan.Ahmad · June 20, 2019, 6:14pm

@cap.aspose,

Please use this workaround to achieve your requirements. If there is still an issue than share feedback with us.

mudassir.fayyaz · June 20, 2019, 6:53pm

@cap.aspose,

We have internally verified the MSG file body on our end. We cannot obtain desired output (Plain text) without links using Aspose.Email because we are using PidTagBody property to read plain/text body from message file.

Only if it is empty, plain/text body will be converted from RTF/HTML.

If we compare data from

Console.WriteLine(msg.BodyHtml);
//and 
Console.WriteLine(msg.Body);

We can see that this is not the same data msg.Body contains plain/text body with links, not HTML body. Also plain/text output is same with old releases as well. Source message “Message1.msg” contains PidTagBody property (The PidTagBody property ([MS-OXPROPS] section 2.618) contains unformatted text, which is the text/plain). This property if exists used for text/plain representation msg.Body.

Therefore, I suggest you to please use the workaround approach on your end to serve the purpose.