Free Support Forum - aspose.com

Conversion html to docx problem with German "Umlaute"

We are converting html to docx with aspose words .net. In the docx the german “Umlaute ö ä ü” are not correct.

If we delete (see Image.png) a part of the html file it works.

Original html file which does not work:
TestHtmlToDocx.zip (941 Bytes)

Image:
Image.png (104.4 KB)

The code:
////Html to docx
Words.Document doc = new Words.Document(“C:/SVN_Incite/Prototypes/AsposePdfConversion/Aspose/Files/TestHtmlToDocx.html”);
var opt = new Aspose.Words.Saving.HtmlSaveOptions();
opt.Encoding = Encoding.UTF8;
FontInfoCollection fontInfos = doc.FontInfos;
fontInfos.EmbedTrueTypeFonts = true;
fontInfos.EmbedSystemFonts = true;
fontInfos.SaveSubsetFonts = true;
doc.Save(“C:/SVN_Incite/Prototypes/AsposePdfConversion/Aspose/Files/TestHtmlToDocx.docx”);

@marchuber,

Thanks for your inquiry. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-16671. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

How long does this take until we can use the correct version?

html to txt does also not working. Is this the same issue? Please have a look at the attached files.

Code:
Words.Document doc = new Words.Document(“C:/SVN_Incite/Prototypes/AsposePdfConversion/Aspose/Files/Test.html”);
Words.Saving.TxtSaveOptions opts = new Words.Saving.TxtSaveOptions();
var opt = new Aspose.Words.Saving.HtmlSaveOptions();
opt.Encoding = Encoding.UTF8;
FontInfoCollection fontInfos = doc.FontInfos;
fontInfos.EmbedTrueTypeFonts = true;
fontInfos.EmbedSystemFonts = true;
fontInfos.SaveSubsetFonts = true;
doc.Save(“C:/SVN_Incite/Prototypes/AsposePdfConversion/Aspose/Files/TestHtmlTotxt.txt”, opts);

Test.zip (1.6 KB)

Image.png (14.5 KB)

@marchuber,

Regarding WORDSNET-16671, currently this issue is pending for analysis and is in the queue. Once the analysis of this issue is completed, we may then be able to share estimates with you. Rest assured, we will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.

Regarding your newly reported “HTML to TXT” problem, we are checking this scenario and will get back to you soon.

@marchuber,

We have logged this problem in our issue tracking system as WORDSNET-16673. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

@marchuber,

Thanks for your patience. It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed these issues (WORDSNET-16671 and WORDSNET-16673) as ‘Not a Bug’.

Please use HtmlLoadOptions.Encoding as shown below to get the desired output.

Aspose.Words.HtmlLoadOptions lo = new Aspose.Words.HtmlLoadOptions();
lo.Encoding = Encoding.UTF8;
Document doc = new Document(MyDir + @"TestHtmlToDocx.html", lo);
doc.Save(MyDir + @"18.4.docx", SaveFormat.Docx);

Works perfect. Thanks a lot!