Conversion of html to Docx with Hebrew and Images results in Hebrew chararacters turn into some unrecognized chararacters

Version - 23.2.0

Apgrading from 22.2.0 to 23.2.0 I noticed that in some cases when I convert an HTML content to Docx file I see that the Hebrew characters turn into some unrecognized thing. I attach the html content to the topic.
content.zip (59.4 KB)

@dimager
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25453

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@dimager We have completed analysis of the issue. Aspose.Words detects encoding of the document as “Windows-1252”, while in fact it is “UTF-8”. The HTML document contains little text and images encoded in base64. If I remove the images, Aspose.Words will report correctly that the document is encoded in UTF-8.

As a workaround, you can explicitly specify the correct encoding in HtmlLoadOptions:

HtmlLoadOptions options = new HtmlLoadOptions { Encoding = Encoding.UTF8 }; 
Document doc = new Document("in.html", options); 
doc.Save("out.docx"); 

The issues you have found earlier (filed as WORDSNET-25453) have been fixed in this Aspose.Words for .NET 23.11 update also available on NuGet.