Name 'ExtractContentHelper' is not defined

SoumyaJ · April 18, 2022, 8:04am

Trying to Extract Content Between Paragraphs

following snippet gives error :
Extract the content between these nodes in the document. Include these markers in the extraction.

extractedNodes = helper.ExtractContentHelper.extract_content(startPara, endPara, True)

error message :
NameError: name ‘helper’ is not defined

Which library needs to be imported to resolve this?

alexey.noskov · April 18, 2022, 8:08am

@SoumyaJ Looks like you are using Python version of Aspose.Words. If so, you can sing the ExtractContetnHelper class on our github.

SoumyaJ · April 18, 2022, 9:13am

@alexey.noskov Thank you for your prompt reply.

However, I have a concern with the following code :

startPara = doc.first_section.body.get_child(aw.NodeType.PARAGRAPH, 1, True).as_paragraph()
endPara = doc.last_section.body.get_child(aw.NodeType.PARAGRAPH, 1, True).as_paragraph()

this picks only the first line of the source doc; However, if i replace the 1st index with 10, it picks the 12th line in the doc.

What is the way to see the entire flow document including embedded images ?
Based on the flow document I want to iteratively access every node in the flow document.

alexey.noskov · April 18, 2022, 12:31pm

@SoumyaJ Firs of all get_child method accepts a zero-based index of node, so the following code will return the second paragraph:

startPara = doc.first_section.body.get_child(aw.NodeType.PARAGRAPH, 1, True).as_paragraph()

Also, paragraph is not a line of test in MS Word document. Paragraph can occupy any number of lines. Please see our documentation to learn more about Aspose.Words Document Object Model.
You can recursively loop through the child nodes of the document to see it’s structure. For example see the following simple code:

import aspose.words as aw

class TestDocumentStructure:

    @staticmethod
    def node_type_to_string(node_type : int):
        return {
            	1 : "Document",
	            2 : "Section",
	            3 : "Body",
	            4 : "HeaderFooter",
	            5 : "Table",
	            6 : "Row",
	            7 : "Cell",
	            8 : "Paragraph",
	            9 : "BookmarkStart",
	            10 : "BookmarkEnd",
	            11 : "EditableRangeStart",
	            12 : "EditableRangeEnd",
	            13 : "MoveFromRangeStart",
	            14 : "MoveFromRangeEnd",
	            15 : "MoveToRangeStart",
	            16 : "MoveToRangeEnd",
	            17 : "GroupShape",
	            18 : "Shape",
	            19 : "Comment",
	            20 : "Footnote",
	            21 : "Run",
	            22 : "FieldStart",
	            23 : "FieldSeparator",
	            24 : "FieldEnd",
	            25 : "FormField",
	            26 : "SpecialChar",
	            27 : "SmartTag",
	            28 : "StructuredDocumentTag",
	            29 : "StructuredDocumentTagRangeStart",
	            30 : "StructuredDocumentTagRangeEnd",
	            31 : "GlossaryDocument",
	            32 : "BuildingBlock",
	            33 : "CommentRangeStart",
	            34 : "CommentRangeEnd",
	            35 : "OfficeMath",
	            36 : "SubDocument"
        }[node_type]

    @staticmethod
    def print_node_structure(node : aw.Node, level : int) :

        tabs = "";
        i = 0
        while i < level : 
            tabs = tabs + "    "
            i = i+1

        print(tabs + TestDocumentStructure.node_type_to_string(node.node_type))

        if(node.is_composite) :
            for child in node.as_composite_node().child_nodes:
                TestDocumentStructure.print_node_structure(child, level+1)



lic = aw.License()
lic.set_license("X:\\awnet\\TestData\\Licenses\\Aspose.Words.Python.NET.lic")

doc = aw.Document("C:\\Temp\\in.docx")
TestDocumentStructure.print_node_structure(doc, 0)