Word 有序列表、嵌套列表解析失效

JLong_Chen · July 21, 2025, 8:39am

当前使用版本 Aspose.Words for Java 23.10 ，使用时发现三个列表解析问题：
1、无法解析成嵌套列表
2、无法解析成嵌套列表，而且缩进变成了空格符代替
3、不同序号类型无法解析成序号，全部是p 标签

参考附件：
xuhao.docx (12.1 KB)

Professionalize.Discourse · July 21, 2025, 8:39am

Cause

The issues you’re experiencing with nested lists and numbering in Aspose.Words for Java are likely due to the document’s structure or how the library interprets certain formatting elements. Specifically, it appears that the library is not correctly recognizing the nested list structure, which results in incorrect parsing and formatting.

Solution

To resolve these issues, please follow these steps:

Check Document Structure: Ensure that your Word document is formatted correctly with proper list styles. If the document is not structured correctly, Aspose.Words may struggle to parse it accurately.
Update to Latest Version: If you are currently using version 23.10, update to the latest version of Aspose.Words for Java. Newer releases may contain bug fixes or improvements related to list parsing.
Use List Formatting: When creating or modifying lists programmatically, ensure you are using the appropriate methods to define list styles and levels. This will help maintain the intended structure.

Code

Here’s a simple example of how to create a nested list programmatically:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Create a list
List list = doc.getLists().add(ListTemplate.Bullet);

// Add items to the list
builder.writeln("Item 1");
builder.ListFormat.applyListStyle(list);
builder.writeln("Item 2");

// Create a nested list
builder.ListFormat.ListIndent();
builder.writeln("Subitem 1");
builder.ListFormat.ListOutdent();
builder.writeln("Item 3");

doc.save("output.docx");

Next Steps

Review the structure of your Word document to ensure it follows standard list formatting.
Test the provided code snippet to determine if it resolves your issue with nested lists.
If problems persist, reach out to Aspose support with your specific document for further assistance.

Please let me know if you need any more help!

JLong_Chen · July 21, 2025, 9:38am

文档结构是正常的，使用word 编辑的效果；
现在是将word 文档（docx文件）解析html 出现列表解析异常；
示例代码不能验证这个问题，还请指导一下。

vyacheslav.deryushev · July 21, 2025, 3:59pm

@JLong_Chen 您可以使用以下代码读取所有列表值：

Document doc = new Document("xuhao.docx");
doc.updateListLabels();
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (paragraph.isListItem()) {
        System.out.println(paragraph.getListLabel().getLabelString());
    }
}

如果需要使用 html，可以使用 ExportListLabels.BY_HTML_TAGS 来确保始终使用 <ol><li> 元素。

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setExportListLabels(ExportListLabels.BY_HTML_TAGS);
doc.save("output.html", saveOptions);

JLong_Chen · July 22, 2025, 1:40am

谢谢， ExportListLabels.BY_HTML_TAGS 可以解决始终使用 <ol><li> 元素；
还有三个问题：
1、ol 的type 属性如何保持原样，现在解析很多都默认为 type=“1”，不符合文档中的原值
2、ol 带了 class=“awlistn” 这个class 如何获取？
3、 li 都加了如何保持和文档一致，文档有的列表没有TAB 的

vyacheslav.deryushev · July 22, 2025, 6:18am

@JLong_Chen

HTML 中的 type 属性指定列表中使用的标记类型（字母或数字）。可能的值有 <ol type="1|a|A|i|I">。如果需要列表表的精确值，请使用 ExportListLabels.AS_INLINE_TEXT。
awlist` 保存在 HTML 文档的中。该类用于在 DOCX-HTML-DOCX 之间正确遍历。
文档中的所有制表符都将被空格替换，这是预料之中的行为，但要获得正确的结果应该不成问题。

JLong_Chen · July 22, 2025, 7:23am

Aspose.Words for Java 23.10
1、使用 ExportListLabels.AS_INLINE_TEXT 将失去 <ol><li> 元素，变成

内容；
ExportListLabels.BY_HTML_TAGS 可以保留 <ol><li> 元素，但是 type 属性指定列表中使用的标记类型（字母或数字）。可能的值有 <ol type="1|a|A|i|I">，解析时无法精确标记
2、文档中的所有制表符都将被空格替换；但是有的列表没有缩进；比如下面内容（也可参考问题1参考附件）：
a.Dasjdljds
b.Dsadjl
1.Sadasdj
2.Saldjlasj
2.1.Sdjl
解析后变成：
a.Dasjdljds
b.Dsadjl
1.Sadasdj
2.Saldjlasj
2.1.Sdjl

vyacheslav.deryushev · July 22, 2025, 10:01am

@JLong_Chen 能否请您提供有关您的案例的更多信息？
请注意，MS Word 和 HTML 文档的布局是不同的。在 MS Word 文档中，有几个列表项是通过设置页边距配置的。因此，当您需要让列表向左对齐时，您需要用您需要的样式来配置它。

我看到了现在不支持的数字问题。