I am using Aspose.word .net 4.2.1 to save word document as Html file. The results file contains Â character. Could you please help me to check what’s the reason? Here is my code and zip file.
Aspose.Words.Document wordDoc = new Aspose.Words.Document(inputFile);
// Save the document in html format into a memory stream.
System.IO.MemoryStream stream = new System.IO.MemoryStream();
stream.Position = 0;
System.IO.StreamWriter writer = new System.IO.StreamWriter(outputFile);
System.IO.StreamReader reader = new System.IO.StreamReader(stream);
Sorry, I don’t see any such character in the attached HTML file. Maybe I am missing something. Please post an excerpt from the resulting HTML file that you consider to be incorrect.
You can open the html file which I zipped in my last email by WordPad. And search “Â«Test_TitleÂ»” or “Â” string. You can find several occurrences of “Â” in it.
It is a WordPad glitch. It is generally a bad idea to use WordPad for viewing files that contain symbols outside of 0x20-0x7F range. You should better use some conventional text editor for this, like Notepad for example. I ususally use editors that can show you file contents in both textual and hexadecimal forms. That way you can check actual codes of the symbols.
Hope this helps,
The Â characters do gone in the Notepad. But the lines which let people to fill into information at the end of second page (in additional information section) were gone also which is not right.
Also when I use org.eclipse.swt.browser.Browser to view the html file in my Java program, the Â characters are showing in the swt Browser, which is unaccepted to us.
I attached the zip file.
Sorry I hit the post too fast to forget the attachment. Here you are.
Another problem with your HTML file is that it is incomplete, which means it was not fully written. Try to flush and close the stream after writing. Maybe then your problems will be gone.
Please note that you can set SaveOptions.ExportPrettyFormat property to get a nice indented HTML in the output. It can be used while developing to make it easier for you to check the resulting HTML.
The problem still exists after I added flush and close. I noticed the Aspose uses the charset=utf-8. If I open the Word document and save the doc as Html I got charset=windows-1252. Can I set the charset in StreamWriter?
In StreamWriter you can set the encoding of the data being written itself, but not the charset specified in HTML. You can try it of course:
StreamWriter writer = new StreamWriter(stream, Encoding.GetEncoding(1252));
But to set “charset=…” in HTML appropriately, we should make changes in our code. You are correct, at the moment the only charset there is UTF-8. If you think it will help, we can consider specifying alternate charset or something.
As Dmitry pointed out, you cannot control output charset for HTML export from Aspose.Words API. Aspose.Words always produces files in UTF-8 encoding.
But you can easily change the encoding in your code with the following:
//Save the document in html format into a memory stream.
MemoryStream stream = new MemoryStream();
stream.Position = 0;
using (StreamReader reader = new StreamReader(stream, Encoding.UTF8))
html = reader.ReadToEnd();
html = html.Replace("charset=utf-8", "charset=windows-1252");
using (StreamWriter writer = new StreamWriter(new FileStream("test Out.html"), FileMode.Create), Encoding.GetEncoding(1252)))
I have tested this code and your html shows up correctly in WordPad after the encoding change. No annoying extra characters appear.
Hope this helps,
This did solve my problem. Thank.
The issues you have found earlier (filed as 1552) have been fixed in this update.
The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan