BOM is causing troubles- it would be good to have a way to include it or not during export

The issue we are having is that we are taking a document and trying to save it as HTML.

Then when we feed it thru the UTF8 encoding we are getting the Byte order markers at the beginning of the of the string.

MemoryStream outpuStream = new MemoryStream();
Aspose.Words.Document documentToConvert = new Aspose.Words.Document(fileNameAndPath);
documentToConvert.Save(outputStream, Aspose.Words.SaveFormat.Html);
string htmlDocument = System.Text.Encoding.UTF8.GetString(outputStream.GetBuffer(),
    0, Convert.ToInt32(outputStream.Length));

So when we try and send the html document over the web or save it to a database it is getting the BOM along withit. We don’t want that there.
So we are having to manually remove it by changing it to the following adding the bolded text:

MemoryStream outpuStream = new MemoryStream();
Aspose.Words.Document documentToConvert = new Aspose.Words.Document(fileNameAndPath);
documentToConvert.Save(outputStream, Aspose.Words.SaveFormat.Html);
string htmlDocument = System.Text.Encoding.UTF8.GetString(outputStream.GetBuffer(),
    0, Convert.ToInt32(outputStream.Length));
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (htmlDocument.StartsWith(_byteOrderMarkUtf8))
{
    htmlDocument = htmlDocument.Remove(0, _byteOrderMarkUtf8.Length);
}

Is there any way we can make this an option, so we dont have to removing it manually?
–Clayton Meisman

Hello

Thanks for your inquiry. I’m afraid, there is no any option to remove BOM (Byte Order Mark). You can also use the following code to remove BOM from the string:

public string ConvertDocumentToHtml(Document doc)
{
    string html = string.Empty;
    // Save document to MemoryStream in Html format
    using(MemoryStream htmlStream = new MemoryStream())
    {
        doc.Save(htmlStream, SaveFormat.Html);
        // Get Html string
        html = Encoding.UTF8.GetString(htmlStream.GetBuffer(), 0, (int) htmlStream.Length);
    }
    // There could be BOM at the beginning of the string.
    // We should remove it from the string.
    while (html[0] != '<')
        html = html.Substring(1);
    return html;
}

Best regards,

Hi Clayton,

there is an option to save Html without BOM at the beginning:

HtmlSaveOptions so = HtmlSaveOptions();
so.Encoding = new UTF8Encoding(false);
doc.Save("test.html", so);

It was available in 2010 too. Sorry for misleading.

We will made this value of encoding default one in the nearest release.

1 Like

The issues you have found earlier (filed as WORDSNET-3087) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan