Aspose convert docx to md,Table Content Conversion Exception

I convert docx to markdown.There is a table content in the docx, not an image. But after converting to markdown, the table content becomes an image.
code:

document = aw.Document(file_path)
    for page in range(0, document.page_count):
        extractedPage = document.extract_pages(page, 1)
        name = f"output2/page{page + 1}.docx"
        extractedPage.save(name)
        document1 = aw.Document(name)
        document1.unlink_fields()
        for section in document1.sections:
            section = section.as_section()
            footer = section.headers_footers[aw.HeaderFooterType.FOOTER_PRIMARY]
            if footer:
                fields = footer.get_child_nodes(aw.NodeType.FIELD_START, True)

                for field_start in fields:
                    field_start = field_start.as_field_start()

                    field = field_start.get_field()
                    if field.type is aw.fields.FieldType.FIELD_PAGE:
                        field.remove()
        document1.update_fields()
        save_options = aw.saving.MarkdownSaveOptions()
        save_options.image_resolution = 300
        document1.save(f"{name}.md", save_options)

docx:
1ACE2023030701 客服活动报告.docx (21.5 KB)
result:

@David_Matin

Cause

The issue you’re experiencing, where table content in a DOCX file is being converted to an image in the Markdown output, is likely due to how the Aspose.Words library handles tables during the conversion process. When tables are complex or contain certain formatting, they may be rendered as images instead of maintaining their original structure.

Solution

To address this, you can adjust the MarkdownSaveOptions settings. Specifically, ensure that the save_options.export_table_as_image property is set to false. This will instruct the library to attempt to preserve the table structure in the Markdown output instead of converting it to an image.

Code

Here’s an updated version of your code snippet with the necessary adjustment:

document = aw.Document(file_path)
for page in range(0, document.page_count):
    extractedPage = document.extract_pages(page, 1)
    name = f"output2/page{page + 1}.docx"
    extractedPage.save(name)
    document1 = aw.Document(name)
    document1.unlink_fields()
    for section in document1.sections:
        section = section.as_section()
        footer = section.headers_footers[aw.HeaderFooterType.FOOTER_PRIMARY]
        if footer:
            fields = footer.get_child_nodes(aw.NodeType.FIELD_START, True)
            for field_start in fields:
                field_start = field_start.as_field_start()
                field = field_start.get_field()
                if field.type is aw.fields.FieldType.FIELD_PAGE:
                    field.remove()
    document1.update_fields()
    
    save_options = aw.saving.MarkdownSaveOptions()
    save_options.image_resolution = 300
    save_options.export_table_as_image = False  # Ensure tables are not exported as images
    document1.save(f"{name}.md", save_options)

Next Steps

  1. Run the updated code and verify if the tables are preserved in the Markdown output.
  2. If the issue persists, simplify the table formatting in the original DOCX file, as complex styles may still lead to conversion issues.
  3. Review the Aspose.Words documentation for any additional options related to Markdown conversion that may assist further.

Please let me know if this resolves your issue!

@David_Matin This is an expected behavior, because the table is inside floating shape in your document:

@alexey.noskov Sorry I don’t understand.How do I convert to markdown and make this table to text ?

@David_Matin The table should not ne inside the shape. Upon conversion to MD or HTML Aspose.Words renders shapes, so they are represented as images in the output documents. To make the table to be converted as pure markdown, you should move it outside the shape into the main document content.