Inconsistency in the header position between the HTML string converted to Word

Hello, may I ask how to handle the inconsistency in the header position between the HTML string converted to Word and the Word document and the HTML source document

// 加载HTML文件
// Document document = new Document(Paths.get("E:\\Temp\\测试.html").toString()); // HTML文件路径
Document document = new Document(); // HTML文件路径
DocumentBuilder builder = new DocumentBuilder(document);
builder.insertHtml(fileToString("E:\\Temp\\source.html"));
// 保存为Word文档
document.save("E:\\Temp\\output.docx"); // 输出Word文件路径及名称

source.zip (8.3 KB)

@Mikeykiss
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26920

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@Mikeykiss We have completed analyzing the issue and concluded this is not a bug. The issue is caused by limitations of the Aspose.Words’ document model.

The source HTML document uses the “display” CSS property to change rendering of <figure> and <figcaption> elements. It has the following structure:

<figure style="display:table">
    <table>
        <!-- ... -->
    </table>
    <figcaption style="display:table-caption;caption-side:top">...</figcaption>
</figure>

Aspose.Words recognizes such structures and supports them to a certain extent. However, this support is limited and in this case the <figure> element is not imported as a table, because it doesn’t have any cells. Consequently, the <figcaption> element is not imported as a table caption.

It’s not a good idea for Aspose.Words to import <figure> as a table in this scenario, because HTML browsers don’t create a table for this element (they only change rendering of the element’s block). If we create a table, we’ll introduce more semantic differences into the document and may make it look worse. So we prefer not to change the current behavior.

We would recommend you changing the HTML document and making it compliant with the HTML Standard. You should remove non-standard “display” styles and move the <figcaption> element before the table:

<figure>
    <figcaption>...</figcaption>
    <table>
        <!-- ... -->
    </table>
</figure>

Alternatively, the customer could use the element instead of <figcaption>:

<figure>
    <table>
        <!-- ... -->
        <caption style="caption-side:top">...</caption>
    </table>
</figure>

@alexey.noskov Okay, thank you

1 Like