Aspose words converting from HTML to DOCX with image and text in Hebrew

Version - 24.10.0 (maybe it started before)
In the upoaded file you will find an html I try to convert to docx file (HtmlForAsposeTicket.txt) and 2 files that is the result of conversion using Aspose.Words library:
ConversionResult_23-1-0.docx - result of conversion using the 23.1.0 which worked ok
ConversionResult_24-10-0.docx - result of conversion using the 24.10.0 which dod not worked and instead of hebrew text we see some unrelated characters.

The code is used for conversion is:

var content = <here you read the content of the HtmlForAsposeTicket.txt>
using (var srcStream = new MemoryStream(Encoding.UTF8.GetBytes(content)))
using (var dstFileStream = new MemoryStream())
{
    var wrdf = new Aspose.Words.Document(srcStream);
    var savingOptions = new Aspose.Words.Saving.OoxmlSaveOptions
    {
        SaveFormat = Aspose.Words.SaveFormat.Docx,
    };

    wrdf.Save(dstFileStream, savingOptions);
    return dstFileStream.ToArray();
}

Aspose.Words version 23.1.0 creates this:

Aspose.Words version 24.10.0 creates this:

Thank you :slight_smile:
ticket.zip (39.4 KB)

@dimager

It seems you are experiencing issues with Hebrew text rendering when converting HTML to DOCX using Aspose.Words version 24.10.0. This problem may be related to font handling or the support for complex scripts in the newer version.

Here are a few suggestions to troubleshoot and potentially resolve the issue:

  1. Font Availability: Ensure that the fonts used in your HTML content are available in the environment where the conversion is taking place. If the required fonts are not installed, Aspose.Words may substitute them, which can lead to incorrect rendering of characters.

  2. Enable Advanced Typography: For complex scripts like Hebrew, you may need to enable advanced typography features. This can be done using the TextShaperFactory in Aspose.Words. Enabling this feature can help with the correct positioning of diacritics and ligatures.

    Here’s a code snippet to enable advanced typography:

    Aspose.Words.Fonts.TextShaperFactory.EnableAdvancedTypography = true;
    
  3. Check for Updates: Since you mentioned that the previous version (23.1.0) worked correctly, consider checking the release notes for version 24.10.0 to see if there are any known issues or changes related to HTML to DOCX conversion or Hebrew text handling.

  4. Testing with Different HTML: If possible, try converting a simpler HTML file with Hebrew text to see if the issue persists. This can help determine if the problem is specific to the content of your HTML file.

  5. Revert to Previous Version: If the issue cannot be resolved, you might consider reverting to version 23.1.0 for your conversion tasks until a fix is provided in a future release.

If these suggestions do not resolve the issue, I recommend reaching out to the Aspose support forum with your specific HTML content and the problematic DOCX file for further assistance.

Feel free to ask if you have any more questions or need further clarification!

Sources:
[1]: FAQ|Aspose.Words for .NET

@dimager Could you please attach your input document here for testing? We will check the issue and provide you more information.

After some research I also found that it can be related to .net version. In .net 7 it works great in both versions while in .net 8 and 9 it doesn’t work with the latest version of the aspose.words

I uploaded an archive with htlm input and ceonvestion results

@dimager
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27863

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The issues you have found earlier (filed as WORDSNET-27863) have been fixed in this Aspose.Words for .NET 25.3 update also available on NuGet.