Hello,
How can I extract text from shapes and text boxes within a word document and insert these as paragraphs next to these shapes/text boxes in the document?
I would like to use Aspose.Words for Python via .NET.
Hello,
How can I extract text from shapes and text boxes within a word document and insert these as paragraphs next to these shapes/text boxes in the document?
I would like to use Aspose.Words for Python via .NET.
To extract text from shapes and text boxes within a Word document using Aspose.Words for Python via .NET, you can follow these steps:
Document
class.GetChildNodes
method to retrieve all shapes in the document. You can filter these shapes to find text boxes and other relevant shapes.Here is a sample code snippet demonstrating this process:
import aspose.words as aw
# Load the document
doc = aw.Document("path/to/your/document.docx")
# Iterate through all shapes in the document
for shape in doc.get_child_nodes(aw.NodeType.SHAPE, True):
if shape.has_text_frame:
# Extract text from the shape
shape_text = shape.text_frame.text
# Create a new paragraph with the extracted text
new_paragraph = aw.Paragraph(doc)
new_paragraph.append_child(aw.Run(doc, shape_text))
# Insert the new paragraph after the shape
shape.parent_node.insert_after(new_paragraph, shape)
# Save the modified document
doc.save("path/to/your/modified_document.docx")
This approach allows you to effectively extract text from shapes and text boxes and place it as paragraphs in your document. Make sure to adjust the file paths as necessary for your environment.
For more detailed examples and documentation, you can refer to the official Aspose documentation on working with shapes and text frames.
If you have any further questions or need additional assistance, feel free to ask!
The code above raises the following error:
‘aspose.words.Node’ object has no attribute ‘has_text_frame’
Can someone please help me create this code properly?
@ln22 The code produced by AI is not quite correct. You can use the following code to extract content from textbox shapes and insert the content after them:
doc = aw.Document("C:\\Temp\\in.docx")
builder = aw.DocumentBuilder(doc)
for s in doc.get_child_nodes(aw.NodeType.SHAPE, True):
s = s.as_shape()
if s.shape_type == aw.drawing.ShapeType.TEXT_BOX :
# Shape is inline node, and content in the textbox shape is block level, so it is not possible
# to put it right after shape, so move DocumentBuilder to the shape and insert a paragraph break.
# then put content after it.
builder.move_to(s)
builder.writeln()
while s.has_child_nodes:
builder.current_paragraph.parent_node.insert_before(s.first_child, builder.current_paragraph)
# remove empty textbox.
s.remove()
doc.save("C:\\Temp\\out.docx")