Html formatting is lost when open from stream

Hi,

I have a problem with reading HTML data in a Document from a stream. The following code illustrates this:

String html = "Large text: <font size=\"5\">Text.</font>";

LoadOptions loadOptions;
Document document;

using(StreamWriter wrt = new StreamWriter(@"C:\temp\tmpdata.html"))
{
    wrt.Write(html);
}

loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
document = new Document(@"C:\temp\tmpdata.html", loadOptions);
document.Save(@"C:\temp\format_ok.docx");

using(Stream s = new MemoryStream(System.Text.Encoding.UTF32.GetBytes(html)))
{
    loadOptions = new LoadOptions();
    loadOptions.LoadFormat = LoadFormat.Html;
    document = new Document(s, loadOptions);
    document.Save(@"C:\temp\format_not_ok.docx");
}

I have a bit of HTML code (variable html) that has some large text (between ). When I first save this html to file and create a document from this file and save it to docx, the text looks fine (i.e. Text is large), file C:\temp\format_ok.docx.

However, when I directly read the html from a stream into a document and save this to docs, the text isn’t large anymore. It seems that all formatting is lost, file C:\temp\format_not_ok.docx.

Do I do something wrong?

Steven.

Hi

Thanks for your inquiry. Why do you use UTF32 encoding to read the file? You should use UTF8, in this case both output documents looks the same.

String html = "Large text: <font size=\"5\">Text</font>.";
LoadOptions loadOptions;
Document document;
using(StreamWriter wrt = new StreamWriter(@"C:\temp\tmpdata.html"))
    wrt.Write(html);

loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
document = new Document(@"C:\temp\tmpdata.html", loadOptions);
document.Save(@"C:\temp\format_ok.docx");

using(Stream s = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
    loadOptions = new LoadOptions();
    loadOptions.LoadFormat = LoadFormat.Html;
    document = new Document(s, loadOptions);
    document.Save(@"C:\temp\format_not_ok.docx");
}

Best regards,
Best regards,

Ok, thanks for the solution. This seems to solve the problem indeed. Don’t know why I used UTF32, normally I’d use UTF8. Thanks again!

Best regards,
Steven.