Error saving HTML from DOC

I was testing an idea I’ve read in the forums. I want to change absolute paths to relative path to images linked in html file. But in the process of reading and writing a new character, À, is added at the beginning, and the last > from ‘’ is truncated.

I send the result in a byte array, and the ‘client application’ regenerates the file with this byte array:

Code:

Aspose.Words.Document docWord = new Aspose.Words.Document(strFilePath);
System.IO.MemoryStream
htmlStream = new System.IO.MemoryStream();
docWord.SaveOptions.HtmlExportHeadersFooters = true;
docWord.SaveOptions.ExportPrettyFormat
= true;
docWord.Save(htmlStream,
SaveFormat.Html);
BinaryReader brHTML = new BinaryReader(htmlStream, System.Text.Encoding.UTF8);
htmlStream.Seek(0, SeekOrigin.Begin);
string strTextoHTML = new string(brHTML.ReadChars((int)htmlStream.Length));
brHTML.Close();
htmlStream.Close();
/*
StreamReader srHTML = new StreamReader(htmlStream);
htmlStream.Seek(0,SeekOrigin.Begin);
string strTextoHTML = srHTML.ReadToEnd();
srHTML.Close();
htmlStream.Close();
*/

htmlStream = new System.IO.MemoryStream();
BinaryWriter bwHTML = new BinaryWriter(htmlStream, System.Text.Encoding.UTF8);
bwHTML.Write(strTextoHTML);
byte[] bytesFile = htmlStream.ToArray();
bwHTML.Close();
htmlStream.Close();
// Client Application
FileStream streamFile = new FileStream(System.IO.Path.GetTempFileName(), FileMode.OpenOrCreate, FileAccess.Write);
streamFile.Write(bytesFile, 0, byteArray.GetUpperBound(0));
streamFile.Close();

I attached the original document and the files obtained after the conversion.

Thanks for the attention.

Hi,

  1. Use StreamReader (commented out in your code) and StreamWriter instead of BinaryReader and BinaryWriter. This will make sure the UTF8 preamble is read and written correctly.
  2. Don’t forget to flush the writer before getting bytes from the memory stream.
  3. Array.GetUpperBound does exactly what its name specifies and it is not the right method to be used in this context. Use the Length property instead.

Thanks for your attention.

  1. In my first version I used StreamReader and StreamWriter, but I’ve got a problem with the latter.
    StreamWrite.Write seems to truncate my string.

  2. I think don’t understand this point.

  3. I had already changed it. Thanks anyway.

Code that gets truncated, Why?:

Aspose.Words.Document docWord = new Aspose.Words.Document(strFilePath);
System.IO.MemoryStream htmlStream = new System.IO.MemoryStream();
docWord.SaveOptions.HtmlExportHeadersFooters = true;
docWord.SaveOptions.ExportPrettyFormat = true;
docWord.Save(htmlStream, SaveFormat.Html);

StreamReader srHTML = new StreamReader(htmlStream);
htmlStream.Seek(0, SeekOrigin.Begin);
string strTextoHTML = srHTML.ReadToEnd();

srHTML.Close();
htmlStream.Close();

htmlStream = new System.IO.MemoryStream();
StreamWriter swHTML = new StreamWriter(htmlStream, System.Text.Encoding.UTF8);
swHTML.Write(strTextoHTML);
byte[] bytesFile = htmlStream.ToArray();

swHTML.Close();

// Client Application

FileStream streamFile = new FileStream(System.IO.Path.GetTempFileName(), FileMode.OpenOrCreate, FileAccess.Write);
streamFile.Write(bytesFile, 0, byteArray.length);
streamFile.Close();

I’ve attached the truncated result.

Sorry for being unclear regarding the point #2 but this is what makes the output truncated. You should flush StreamWriter’s internal buffer before you retrieve bytes from the memory stream:

swHTML.Write(strTextoHTML);
swHTML.Flush();
byte [] bytesFile = htmlStream.ToArray();

It works. Thanks a lot!!

Have a nice weekend.