Extracting Text for Textboxes and Shapes with Aspose.Words for Python via .NET

Hello,

How can I extract text from shapes and text boxes within a word document and insert these as paragraphs next to these shapes/text boxes in the document?

I would like to use Aspose.Words for Python via .NET.

@ln22

To extract text from shapes and text boxes within a Word document using Aspose.Words for Python via .NET, you can follow these steps:

  1. Load the Document: Start by loading your Word document using the Document class.
  2. Iterate Through Shapes: Use the GetChildNodes method to retrieve all shapes in the document. You can filter these shapes to find text boxes and other relevant shapes.
  3. Extract Text: For each shape, you can access its text content and then insert it as a new paragraph in the document.

Here is a sample code snippet demonstrating this process:

import aspose.words as aw

# Load the document
doc = aw.Document("path/to/your/document.docx")

# Iterate through all shapes in the document
for shape in doc.get_child_nodes(aw.NodeType.SHAPE, True):
    if shape.has_text_frame:
        # Extract text from the shape
        shape_text = shape.text_frame.text
        # Create a new paragraph with the extracted text
        new_paragraph = aw.Paragraph(doc)
        new_paragraph.append_child(aw.Run(doc, shape_text))
        # Insert the new paragraph after the shape
        shape.parent_node.insert_after(new_paragraph, shape)

# Save the modified document
doc.save("path/to/your/modified_document.docx")

Explanation:

  • The code loads a Word document and iterates through all shapes.
  • It checks if the shape has a text frame, extracts the text, and creates a new paragraph with that text.
  • Finally, it inserts the new paragraph right after the shape in the document.

This approach allows you to effectively extract text from shapes and text boxes and place it as paragraphs in your document. Make sure to adjust the file paths as necessary for your environment.

For more detailed examples and documentation, you can refer to the official Aspose documentation on working with shapes and text frames.

If you have any further questions or need additional assistance, feel free to ask!

The code above raises the following error:

‘aspose.words.Node’ object has no attribute ‘has_text_frame’

Can someone please help me create this code properly?

@ln22 The code produced by AI is not quite correct. You can use the following code to extract content from textbox shapes and insert the content after them:

doc = aw.Document("C:\\Temp\\in.docx")
builder = aw.DocumentBuilder(doc)
for s in doc.get_child_nodes(aw.NodeType.SHAPE, True):
    s = s.as_shape()
    if s.shape_type == aw.drawing.ShapeType.TEXT_BOX :
        # Shape is inline node, and content in the textbox shape is block level, so it is not possible 
        # to put it right after shape, so move DocumentBuilder to the shape and insert a paragraph break.
        # then put content after it.
        builder.move_to(s)
        builder.writeln()
        while s.has_child_nodes:
            builder.current_paragraph.parent_node.insert_before(s.first_child, builder.current_paragraph)
        # remove empty textbox.
        s.remove()

doc.save("C:\\Temp\\out.docx")