.docx to .html table formatting error

Hi,

When converting a .docx file to a.html file using Aspose.Words, I noticed a formatting issue when my EMF table is converted see screenshot attached.

I had the same error with pdf conversion which as solved with
convertedDocument.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
But does not solve the issue with html conversion.

I am converting as following

Document convertedDocument = DocumentHelper.OpenReadOnly(m_sourceDocument);
convertedDocument.LayoutOptions.RevisionOptions.ShowRevisionBars = false;
convertedDocument.LayoutOptions.RevisionOptions.ShowRevisionMarks = false;
convertedDocument.AcceptAllRevisions();
convertedDocument.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
convertedDocument.Save(m_output, CreateResponsiveHtmlSaveOptions(subfolderFullPath, m_isResponsiveHtml, m_htmlImageSavingCallback));


private HtmlSaveOptions CreateResponsiveHtmlSaveOptions(string subfolderFullPath, bool convertMetaFileToSvg, IImageSavingCallback htmlImageSavingCallback)
{
    string subfolderName = new DirectoryInfo(subfolderFullPath).Name;

    var htmlSaveOptions = new HtmlSaveOptions
    {
        CssStyleSheetType = CssStyleSheetType.Inline,
        ImagesFolder = subfolderFullPath,
        ImagesFolderAlias = subfolderName, // We set the images folder as relative to the .htm 
        ExportListLabels = ExportListLabels.AsInlineText,
        ImageResolution = 192,
        ImageSavingCallback = htmlImageSavingCallback,
        ExportHeadersFootersMode = ExportHeadersFootersMode.None,
        ExportShapesAsSvg = m_isResponsiveHtml,
        MetafileFormat = convertMetaFileToSvg ? HtmlMetafileFormat.Svg : HtmlMetafileFormat.Png,
        AllowNegativeIndent = false,
        ScaleImageToShapeSize = false,
        ExportTextInputFormFieldAsText = true,
        ExportDropDownFormFieldAsText = true
    };

    return htmlSaveOptions;
}

The output has a formatting error shown in the screenshot attached

I am using Aspose.Words 25.1.0

Please find attached the problematic docx file
emf problematic table .docx (18.8 KB)

Implementing the AI generated suggestions also did not work

Thanks

@ServerSide527

Hi,

Thank you for providing detailed information about your issue. I see that you’re experiencing a formatting problem when converting a .docx file to .html using Aspose.Words, particularly with the EMF table.

Here are a few steps you can take to troubleshoot this issue:

  1. Update Aspose.Words: Ensure you are using the latest version of Aspose.Words as updates often include bug fixes and performance improvements.

  2. Handle Metafiles: Ensure that metafiles are correctly handled by setting MetafileFormat to HtmlMetafileFormat.Svg. You’ve already done this in your function.

  3. Check HTML Save Options: Review the HtmlSaveOptions you are using as you can customize various properties to handle specific formatting challenges.

Let’s enhance the HtmlSaveOptions a bit more based on your setup:

private HtmlSaveOptions CreateResponsiveHtmlSaveOptions(string subfolderFullPath, bool convertMetaFileToSvg, IImageSavingCallback htmlImageSavingCallback)
{
    string subfolderName = new DirectoryInfo(subfolderFullPath).Name;

    var htmlSaveOptions = new HtmlSaveOptions
    {
        CssStyleSheetType = CssStyleSheetType.Inline,
        ImagesFolder = subfolderFullPath,
        ImagesFolderAlias = subfolderName, // We set the images folder as relative to the .htm
        ExportListLabels = ExportListLabels.AsInlineText,
        ImageResolution = 192,
        ImageSavingCallback = htmlImageSavingCallback,
        ExportHeadersFootersMode = ExportHeadersFootersMode.None,
        ExportShapesAsSvg = m_isResponsiveHtml,
        MetafileFormat = convertMetaFileToSvg ? HtmlMetafileFormat.Svg : HtmlMetafileFormat.Png,
        AllowNegativeIndent = false,
        ScaleImageToShapeSize = false,
        ExportTextInputFormFieldAsText = true,
        ExportDropDownFormFieldAsText = true,
        ExportImagesAsBase64 = false,  // Ensure images are saved in their specified format, not embedded as base64
        ExportTextBoxAsSvg = true,   // Optionally, export text boxes as SVG if relevant
        ExportEmbeddedFonts = true,  // If font issues exist, exporting embedded fonts might help
        ExportFontResources = true   // If font issues exist, exporting font resources might help
    };

    return htmlSaveOptions;
}

Additional Suggestions:

  1. Test with Simpler File: Create a simpler .docx file with just the EMF table to isolate the problem and see if the conversion error persists.

  2. Ensure Document Integrity: Make sure the .docx file is not corrupted and can be opened and edited without issues.

  3. Review Image Options: Sometimes images in the HTML output might not render correctly due to size or resolution settings.

  4. Use Different Save Format: As a temporary workaround, you can try converting the .docx file to another intermediate format (like .rtf or .pdf) and then to .html to check if the issue persists.

Please apply these suggestions and let me know if the issue is resolved. If the issue persists, consider sharing a minimal example of the docx file that consistently reproduces the problem so that we can further investigate.

If you encounter any specific error messages, please include those as well for more precise troubleshooting.

@ServerSide527 The question is already answered here:
https://forum.aspose.com/t/docx-to-pdf-and-html-table-formatting-error/316509

Hi,

The issue still persists with docx to html conversion.

This seems to arise only when we convert metafiles to svg.

var doc = new Document(@"C:\\path\\emf problematic table .docx")
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;

var htmlOptions = new HtmlSaveOptions {
       MetafileFormat = HtmlMetafileFormat.Svg
};

doc.Save(@"C:\outputConversion.html", htmlOptions);

@ServerSide527 As it is mentioned din the documentation, text shaping is only performed when exporting to PDF or XPS formats.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28531

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.