怎么提取段落内容是绿色字体的内容

hhh1111 · October 18, 2024, 2:04am

怎么提取段落内容是绿色字体的内容？？

alexey.noskov · October 18, 2024, 5:09am

@hhh1111 您能否在此处附加您的输入文档以供测试？

hhh1111 · October 18, 2024, 5:33am

def aw_extract_headings_and_contents_table_dict_id(file):
doc = aw.Document(file)
current_level = 0
data = {}
doc.update_list_labels()
stack = []

for s in doc.sections:
    sect = s.as_section()
    for node in sect.body.get_child_nodes(aw.NodeType.ANY, True):
        if node.node_type == aw.NodeType.PARAGRAPH:
            node = node.as_paragraph()
            if node.paragraph_format.outline_level in [0, 1, 2, 3, 4, 5]:
                if node.node_type == aw.NodeType.FIELD_START:
                    continue
                level = int(node.paragraph_format.outline_level) + 1
                if level > current_level:
                    stack.append((current_level, data))
                    data = {}
                    current_level = level
                elif level < current_level:
                    while stack and stack[-1][0] >= level:
                        old_level, old_data = stack.pop()
                        data = {**old_data, **data}
                        current_level = old_level
                label = ''
                if node.list_format.is_list_item:
                    label = node.list_label.label_string
                text_without_comments = node.get_text().strip()
                current_key = label + text_without_comments if label else text_without_comments

                new_run = node.as_run()
                print(new_run)

                if current_key not in data:
                    data[current_key] = ""
            else:
                if node.list_format.is_list_item:
                    label = node.list_label.label_string
                if node.get_text().strip() and not node.get_ancestor(
                        aw.NodeType.TABLE) and not node.get_ancestor(aw.NodeType.FIELD_START) and data:
                    last_key = list(data.keys())[-1]
                    data[last_key] += label + node.get_text().strip() if label else node.get_text().strip() + "\n"

        if node.node_type == aw.NodeType.TABLE:
            parent_node = node.as_table()
            table_content = aw_read_table_as_markdown(parent_node)

            if data:
                last_key = list(data.keys())[-1]
                data[last_key] += table_content + "\n"

while stack:
    old_level, old_data = stack.pop()
    data = {**old_data, **data}

return data

hhh1111 · October 18, 2024, 5:33am

倫理.docx (16.4 KB)

hhh1111 · October 18, 2024, 5:34am

比如一段话是有绿的色内容。提取出来给这段话前后加一个标签

hhh1111 · October 18, 2024, 5:46am

   color = node.runs[0].font.color.to_rgb()
                color_hex = '#{0:02x}{1:02x}{2:02x}'.format(color.r, color.g, color.b) 用这个代码 不对

hhh1111 · October 18, 2024, 6:42am

不用了已经解决了。。。。。

vyacheslav.deryushev · October 18, 2024, 8:07am

@hhh1111 好吧，看来这里的答案是一样的怎么把table的node节点转换为html格式 - #12 by vyacheslav.deryushev