Python DocumentVisitor

I am trying to evaluate using aspose.words for a project at work. I am using the temporary free license with aspose-words v25.9.0 in Python. I am trying to understand how to best iterate over the nodes inside a document, which feels like a simple task. However, I don’t seem to be able to subclass aspose.words.DocumentVisitor as per the instructions in the documentation here: DocumentVisitor class | Aspose.Words for Python. Instead I am provided with this TypeError when trying to do a very simple test in the interpreter:

class CustomVisitor(aw.DocumentVisitor):
… def init(self):
… super().init()
… def visit_paragraph_start(self, paragraph: aw.Paragraph) → int:
… paragraph.to_string(aw.SaveFormat.TEXT).strip()
… return aw.VisitorAction.CONTINUE

Traceback (most recent call last):
File “”, line 1, in
TypeError: type ‘aspose.words.DocumentVisitor’ is not an acceptable base type

I also tried to simply iterate over the nodes in a section with section.body.get_child_nodes(aw.NodeType.PARAGRAPH, False) which provided nodes, but only as the aspose.words.Node type which I was unable to further iterate into the children of since aspose.words.Node does not have any functions for finding children:

for element in section.body.get_child_nodes(aw.NodeType.ANY, False):
… element
… if element.node_type == aw.NodeType.PARAGRAPH:
… paragraph = element
… paragraph.get_text()
… for child in paragraph.get_child_nodes(aw.NodeType.ANY, False):
… print(’ ’ + str(child.node_type))
… else:
… print(‘$’ + str(element.to_string(save_format=aw.SaveFormat.TEXT)))

<aspose.words.Node object at 0x000002345155B0D0>
‘Guidance for the Example Presentation\r’
Traceback (most recent call last):
File “”, line 6, in
AttributeError: ‘aspose.words.Node’ object has no attribute ‘get_child_nodes’

Any help would be appreciated.

Thanks,
Sean

@seanwhitsitt

It seems you’re encountering a TypeError when trying to subclass aspose.words.DocumentVisitor. This issue typically arises because the DocumentVisitor class is not designed to be subclassed directly in Python. Instead, you can use the Document class’s accept method to visit nodes.

Here’s a simple example of how to iterate over paragraphs in a document without subclassing DocumentVisitor:

import aspose.words as aw

# Load your document
doc = aw.Document("your_document.docx")

# Iterate through all paragraphs in the document
for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True):
    print(paragraph.get_text().strip())

In this code, get_child_nodes is used to retrieve all paragraphs directly from the document, allowing you to process them as needed.

If you need to perform more complex operations, consider using the DocumentVisitor pattern correctly by implementing a visitor class and passing it to the accept method of the document or specific nodes.

If you have further questions or need additional examples, feel free to ask!

I need to be able to distinguish node types and further visit children of nodes. I haven’t been able to get code similar to that with get_child_nodes() to work since everything is provided back as an aspose.words.Node type without access to further call get_child_nodes().

I’d prefer using the visitor pattern anyway, but I was of the understanding that I’d need to subclass DocumentVisitor in order to use it properly. Can I just provide a class that isn’t subclassing DocumentVisitor for that?

@seanwhitsitt
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSPYTHON-105

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@seanwhitsitt Unfortunately, currently it is impossible to use DocumentVisitor in Python version of Aspose.Words. The feature request has been logged as WORDSPYTHON-105. We will keep you updated and let you know once it is resolved.

You can reclusively visit nodes in your document using method like this:

doc = aw.Document("C:\\Temp\\in.docx")
self.visit_node(doc, 0)
def visit_node(self, node, depth):
    print(" " * depth + str(node.node_type.name))
    if node.is_composite:
        node = node.as_composite_node()
        for child in node.get_child_nodes(aw.NodeType.ANY, False):
            self.visit_node(child, depth+1)

Thank you! I think the as_XXX methods will let me build my own “visitor,” but the official visitor class would be better to use. Hopefully that’ll let me finish evaluating if I can do everything I need with aspose. Thanks!

1 Like