We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Html formatting is lost when open from stream

Hi,


I have a problem with reading HTML data in a Document from a stream. The following code illustrates this:

String html = “Large text: <font size=“5”>Text.”;

LoadOptions loadOptions;
Document document;

using (StreamWriter wrt = new StreamWriter(@“C:\temp\tmpdata.html”))
{
wrt.Write(html);
}

loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
document = new Document(@“C:\temp\tmpdata.html”, loadOptions);
document.Save(@“C:\temp\format_ok.docx”);

using (Stream s = new MemoryStream(System.Text.Encoding.UTF32.GetBytes(html)))
{
loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
document = new Document(s, loadOptions);
document.Save(@“C:\temp\format_not_ok.docx”);
}

I have a bit of HTML code (variable html) that has some large text (between ). When I first save this html to file and create a document from this file and save it to docx, the text looks fine (i.e. Text is large), file C:\temp\format_ok.docx.

However, when I directly read the html from a stream into a document and save this to docs, the text isn’t large anymore. It seems that all formatting is lost, file C:\temp\format_not_ok.docx.

Do I do something wrong?

Steven.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. Why do you use UTF32 encoding to read the file? You should use UTF8, in this case both output documents looks the same.

String html = "Large text: Text.";

LoadOptions loadOptions;

Document document;

using (StreamWriter wrt = new StreamWriter(@"C:\temp\tmpdata.html"))

wrt.Write(html);

loadOptions = new LoadOptions();

loadOptions.LoadFormat = LoadFormat.Html;

document = new Document(@"C:\temp\tmpdata.html", loadOptions);

document.Save(@"C:\temp\format_ok.docx");

using (Stream s = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))

{

loadOptions = new LoadOptions();

loadOptions.LoadFormat = LoadFormat.Html;

document = new Document(s, loadOptions);

document.Save(@"C:\temp\format_not_ok.docx");

}

Best regards,

Best regards,

Ok, thanks for the solution. This seems to solve the problem indeed. Don’t know why I used UTF32, normally I’d use UTF8. Thanks again!


Best regards,
Steven.