How to fix HTML to word generating special characters.
sample data is as below
@ravibaburali Could you please attach your input and output documents here for testing? We will check the issue and provide you more information.
Hi Noskov,
Thanks for your response,
Please find the input html file and output docx file. We have purchased word license.
Below is the code snippet to generate this file.
Aspose.Words.Document doc = new Aspose.Words.Document(new System.IO.MemoryStream(Encoding.UTF8.GetBytes(htmlinput)));
doc.Save(@"M:\tesst.docx", Aspose.Words.SaveFormat.Docx );
This is very critical for us, your help is highly appreciated.
Thanks
Ravi Babu
(Attachment testing.html is missing)
tesst.docx (17.7 KB)
@ravibaburali Unfortunately, I do not see your source HTML document. Could you please zip and attach it here for testing? We will check the issue and provide you more information.
@ravibaburali I Cannot reproduce the problem on my side. If you load your HTML file directly from the file, Aspose.Words converts HTML to DOCX properly:
Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");
Also, I tried loading from stream, like in your code and still the output DOCX looks correct:
string html = File.ReadAllText(@"C:\Temp\in.html");
MemoryStream htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(html));
Document doc = new Document(htmlStream);
doc.Save(@"C:\Temp\out.docx");
Most likely, the problem occurs somewhere where HTML string is retrieved. out.docx (8.2 KB)