Encoding issue with DocumentBuilder.InsertHtml

I’m using:
MailMessage message = Aspose.Email.MailMessage.Load(strFileToInsert);
message.Save(strHtmlFileName, Aspose.Email.SaveOptions.DefaultHtml);
to export the body of an e-mail into HTML.
The body of the e-mail has special characters like ä, ö, ü, é. In the generated HTML those characters look fine.
Now, I’m using:
documentBuilder.InsertHtml(fileContent, true);
to insert that HTML into a word document.
But there the special characters are shown as a questionmark on a black diagonal rectangle.
Any ideas how to fix this encoding issue?
Thanks

@tpalmie,

Please ZIP and attach the following resources here for testing:

  • Your simplified input email message file
  • The HTML file showing the desired output
  • Aspose.Words version 20.11 generated output document showing the undesired behavior
  • A screenshot highlighting the problematic area in Aspose.Words generated output document
  • A standalone simple Console application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words DLL/JAR files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your scenario/issue and provide you more information.

SetContentControlTest.zip (226.3 KB)
Hi Awais
In the attached sample I’m converting a .msg e-mail file into HTML and inserting that HTML into a Word document. The e-mail has a french accent Palmié. This french is still visible in the exported HTML but in the Word document it is missing.
What would be your proposed way to insert the content of the e-mail into the Word document?
Best regards,
thomas

@tpalmie,

Please see these documents (Docs and Screenshot.zip (156.4 KB)) and try running the following simple code:

Document doc = new Document("C:\\Temp\\SetContentControlTest\\Template.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.MoveToDocumentEnd();
builder.InsertHtml(File.ReadAllText("C:\\Temp\\SetContentControlTest\\_Matter x - details.htm"));

doc.Save("C:\\Temp\\SetContentControlTest\\20.11.docx");

It produces the same encoding issue. We have logged this problem in our issue tracking system with ID WORDSNET-21512. We will further look into the details of this problem and will keep you updated on the status of linked issue. We apologize for your inconvenience.

@tpalmie,

Regarding WORDSNET-21512, we have completed the analysis of this issue and concluded to close this issue with “not a bug” status. For example, the File.ReadAllText method overload that takes one argument, doesn’t take a specified encoding. In the case, it can detect only UTF-8 or UTF-32. But, the HTML document has Windows-1252 encoding. The following example inserts the HTML document using the right one:

Document doc = new Document("C:\\Temp\\SetContentControlTest\\Template.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.InsertHtml(File.ReadAllText("C:\\Temp\\SetContentControlTest\\_Matter x - details.htm", Encoding.GetEncoding(1252)));
doc.Save("C:\\Temp\\SetContentControlTest\\21.2.docx");

In addition to the above, you might have no idea which encoding is used. In this case you can use DocumentBuilder.InsertDocument Method instead of DocumentBuilder.InsertHtml to insert the HTML document with the right encoding:

Document doc = new Document("C:\\Temp\\SetContentControlTest\\Template.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
builder.InsertDocument(new Document("C:\\Temp\\SetContentControlTest\\_Matter x - details.htm"), ImportFormatMode.KeepSourceFormatting);
doc.Save("C:\\Temp\\SetContentControlTest\\21.2.docx");

So, please load the HTML document from a file using correct encoding.