Free Support Forum - aspose.com

Transform HTML to docx problem with <style>

Hi,

I use aspose.words to get the text part of emails that do not have a textbody defined. I open HTMLBody with aspose.words and then save the document as text file. This works well in general but I experience some problems.

Find attached an email, take its htmlbody, open it in aspose.words as if it was an HTML file and then save the doc as text (I obtain the attached text file). As you can see, the tag appears but it should not.

Can you do something?

Another question please : look at how I encode (normalize) the HTMLbody to create a memorystream I will open with aspose.words. Is it the right way? And after, when I want to get the text back from the memorystream generated with Words, I use UTF8 encoding. Is it the right encoding?

Best regards,

Here is the code I use

MemoryStream ms = new MemoryStream();

UnicodeEncoding uniEncoding = new UnicodeEncoding();

byte[] html = uniEncoding.GetBytes(email.Message.HtmlBody.Normalize());

ms.Write(html, 0, html.Length);

LoadOptions lo = new LoadOptions();

lo.LoadFormat =

LoadFormat.Html;

Document doc = new Document(ms, lo);

MemoryStream msOut = new MemoryStream();

doc.Save(

"d:\\test.docx", Aspose.Words.SaveFormat.Docx);

doc.Save(

"d:\\test.txt", Aspose.Words.SaveFormat.Text);

doc.Save(msOut, Aspose.Words.

SaveFormat.Text);

byte[] txt = msOut.ToArray();

email.EmailTextBody =

Encoding.UTF8.GetString(txt);

Hi

Thanks for your request. Please try using the following code:

// Load the MSG file using Aspose.Network for .NET

MailMessage msg = MailMessage.Load(@"Test001\Test.eml", MessageFormat.Eml);

// Convert MSG to MHTML and save to stream

MemoryStream msgStream = new MemoryStream();

msg.Save(msgStream, MailMessageSaveType.MHtmlFromat);

msgStream.Position = 0;

// Load the MHTML stream using Aspose.Words for .NET

Document msgDocument = new Document(msgStream);

msgDocument.Save(@"Test001\out.docx");

This code produces the correct output on my side.

Best regards,

Hi,

Thank you for your answer but when I do what you say, I get a lot of header lines I don't want: from, to, subject, etc. I just need the text inside the body of the email.

I need to show about 25 emails in a gridview so I need a very synthetic view with no images and no header lines. Only the text in the body.

Regards,

Hi

Thank you for additional information. In this case, you should use code like the following:

// Load the MSG file using Aspose.Network for .NET

MailMessage msg = MailMessage.Load(@"Test001\Test.eml", MessageFormat.Eml);

// Convert body of MSG to HTML and save to stream

string bodyHtml = msg.HtmlBody;

byte[] bodyHtmlBytes = Encoding.UTF8.GetBytes(bodyHtml);

using (MemoryStream bodyHtmlStream = new MemoryStream(bodyHtmlBytes))

{

// Open HTML document using Aspose.Words.

Document doc = new Document(bodyHtmlStream);

// Save document.

doc.Save(@"Test001\out.docx");

}

Hope this helps.

Best regards,

Hi,

Thank you for your answer, it is working this way.

The only thing I had to modify is the line

byte[] bodyHtmlBytes = Encoding.UTF8.GetBytes(bodyHtml);<?xml:namespace prefix = o />

Indeed, with the UTF8 encoding I had some strange characters. I had to use this line instead.

byte[] bodyHtmlBytes = Encoding.Default.GetBytes(bodyHtml);

I don't really understand why, do you have an explaination?

Thank you

Hi

Thank you for additional information. It is perfect that you managed to achieve what you need.

Unfortunately, I also do not have an explanation why you have you change the encoding. Maybe content in your message is not in UTF-8 encoding.

Best regards,

Hi,

A last question: I don't understand why the docx file contains carriage returns but the text file does not. In the text file the carriage returns appears like strange characters (see both files attached).

Finally I use the aspose.MailMessage.PreferredTextEncoding to create the stream from the HTMLBody of the email. Am I right?

Regards,

Hi

Thanks for your request. The problem occurs because manual line breaks are used in your document (in HTML this is
in Word document it can be inserted by pressing Shift+Enter). To resolve the problem, you should replace line break characters in your txt document using carriage return character. For instance, see the following code:

string txt = File.ReadAllText(@"Test001\test.txt");

// Replace line break with paragraph carriage return.

txt = txt.Replace("\v", "\r\n");

using (FileStream fs = new FileStream(@"Test001\out.txt", FileMode.Create))

{

using (StreamWriter writer = new StreamWriter(fs))

{

writer.Write(txt);

}

}

Hope this helps.

Best regards,

Hi,

Thank you for your answer and detailed informations.

It is working perferctly now.

Thank you very much for your help.

Hi again,

A problem remains but I'm not sure that Aspose.Words is responsible for it. I think maybe Aspose.Network is responsible for it.

Look at the email attached, if I want to open the HTMLBody of the email with Aspose.Words, I first need to decode the HTMLBody. Aspose.network email object have information about the body encoding. The property is

Message.BodyEncoding If this property is null, there is another property witch is

Message.PreferredTextEncoding

.

So, to decode the body, here is what I do to find the correct encoder to use :

Encoding en = Encoding.Default;

if (email.Message.PreferredTextEncoding != null) { en = email.Message.PreferredTextEncoding; }

if (email.Message.BodyEncoding != null) { en = email.Message.BodyEncoding; }

byte[] html = en.GetBytes(email.Message.HtmlBody);

MemoryStream ms = new MemoryStream(html);

LoadOptions lo = new LoadOptions();

lo.LoadFormat =

LoadFormat.Html;

Document doc = new Document(ms, lo);

.

For the attached email : bodyencoding is null and preferedTextEncoding is the one detailed in the attached picture. But if I use the preferedTextEncoding then the docx file created with Aspose.Words contains stange characters.

So here is my question : how do I know witch encoding to use to get the correct text in Aspose.words?

Regards,

Hi

Thanks for your request. I suppose it would be better to ask this question in Aspose.Network forum. My colleagues from Aspose.Network team will answer you shortly.

Best regards,

Ok, thank you!