When you convert DOCX to HTML, numbering headings styling is broken

Hello.

If you convert the following DOCX file to HTML, certain number styling is broken.

Occurs in Aspose Words (Java) v24.2 but not in v23.6. Earliest occurrence is v23.7.

My understanding is ExportListLabels.AS_INLINE_TEXT should be used in this case, but instead ExportListLabels.BY_HTML_TAGS is being used instead. If I set this manually, regular bullet list formatting breaks.

Sample file:
numberedList.docx (14.2 KB)

Below is the expected and actual HTML produced, including screenshots.

Expected:

Actual:

Expected:

<html>
<body>
<div style="line-height:116%; font-family:Aptos; font-size:12pt">
<div>
<p style="margin-top:0pt; margin-bottom:8pt">
<span>Below is a numbered list.</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>1)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>First item</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>2)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>Second</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>3)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>Third</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>4)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>----</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>5)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>……………………</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:0pt; text-indent:-18pt">
<span>6)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span>123456789</span>
</p>
<p style="margin-top:0pt; margin-left:36pt; margin-bottom:8pt; text-indent:-18pt">
<span>7)</span>
<span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
<span> </span>
</p>
</div>
</div>
</body>
</html>

Actual:

<html>
<body>
<div style="line-height:116%; font-family:Aptos; font-size:12pt">
    <div>
        <p style="margin-top:0pt; margin-bottom:8pt">
            <span>Below is a numbered list.</span>
        </p>
        <ol type="1" class="awlist1" style="margin:0pt; padding-left:0pt">
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>First item</span>
            </li>
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>Second</span>
            </li>
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>Third</span>
            </li>
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>----</span>
            </li>
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>……………………</span>
            </li>
            <li style="margin-left:36pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span>123456789</span>
            </li>
            <li style="margin-left:36pt; margin-bottom:8pt; text-indent:-18pt">
                <span style="width:7.33pt; font:7pt 'Times New Roman'; display:inline-block">     </span>
                <span> </span>
            </li>
        </ol>
    </div>
</div>
</body>
</html>

@digi0 To get the expected output you should specify ExportListLabels.AS_INLINE_TEXT. Please see the following code:

Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions opt = new HtmlSaveOptions();
opt.setExportListLabels(ExportListLabels.AS_INLINE_TEXT);
opt.setPrettyFormat(true);
doc.save("C:\\Temp\\out.html", opt);

Thanks for your reply.

As I mentioned above:

My understanding is ExportListLabels.AS_INLINE_TEXT should be used in this case, but instead ExportListLabels.BY_HTML_TAGS is being used instead. If I set this manually, regular bullet list formatting breaks.

We don’t know ahead of time what type of list we are getting so we need to cater for all types of lists. In fact, we could have a file that contains both list types. Therefore we need to use ExportListLabels.AUTO.

The lists were converted correctly in Aspose Words (Java) v23.6, but now do not work in v24.2. Do you agree that this is something that should be fixed on the Aspose side? It seems like it could be a bug to me.

EDIT: Sharing my reply to another question in case the way we are saving documents matters for you to troubleshoot on your end. Does the OutputSteam save method work as expected?

@digi0 There were changes in list labels export in 24.1 version of Aspose.Words.
Previously, when ExportListLabels.ByHtmlTags value was specified for HtmlSaveOptions.ExportListLabels save option, some lists could nevertheless be exported as inline text using <p> tags.

Now, when ExportListLabels.ByHtmlTags value is specified for HtmlSaveOptions.ExportListLabels save option, all lists are exported as HTML native elements using <ul>, <ol> and <li> tags.

Some moments worth mentioning regarding new behavior when ExportListLabels.ByHtmlTags value is specified for HtmlSaveOptions.ExportListLabels save option:

  • Previously lists with Heading styles were exported as inline text using <h1>, <h2>, <h3>, <h4>, <h5> and <h6> tags. Now they are exported as HTML native elements using <ul>, <ol> and <li> tags and their styles won’t be preserved after DOCX->HTML->DOCX round-trip.
  • Previously lists with delete revision were exported as inline text using <p> tags. Now they are exported as HTML native elements using <ul>, <ol> and <li> tags and some decrease in the quality of such lists is possible.
  • When a document is exported to MHTML, strikethrough and underline formatting is no longer applied to list markers.

If these changes in behavior are critical, you can use ExportListLabels.Auto value instead of ExportListLabels.ByHtmlTags value for HtmlSaveOptions.ExportListLabels save option, because previously their behavior was quite the same.

Also, I have tested conversion of your document to HTML with ExportListLabels.Auto and the problem is not reproducible. The list is exported fine. Here is the output produced by the following code:

Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions opt = new HtmlSaveOptions();
opt.setPrettyFormat(true);
doc.save("C:\\Temp\\out.html", opt);

out.zip (719 Bytes)

Yes, saving to stream works correctly on my side.

Thanks for all this info! Will see if we can change something on our end to get around this then. Thank again!

1 Like