Hi Team,
We are extracting images from documents using word-aspose. we have received one of the new scenarios in a document. In this document image and caption are combined in a single text frame. how to extract the image. Please suggest me.
Input doc: 2021GB007083-file001.docx (4.9 MB)
@Mahesh39 You can use code like this to extract images from your document:
Document doc = new Document("C:\\Temp\\in.docx");
Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
int counter = 0;
for (Shape s : shapes)
{
if (s.hasImage())
s.getImageData().save("C:\\Temp\\img_" + (counter++) + FileFormatUtil.imageTypeToExtension(s.getImageData().getImageType()));
}
If you also would like to check whether shape is in the groupshape and the parent groupshape contains caption, you can use code like this:
Document doc = new Document("C:\\Temp\\in.docx");
Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape s : shapes)
{
if (s.hasImage())
{
GroupShape parentShape = (GroupShape)s.getAncestor(NodeType.GROUP_SHAPE);
while (parentShape != null && parentShape.getAncestor(NodeType.GROUP_SHAPE) != null)
parentShape = (GroupShape)parentShape.getAncestor(NodeType.GROUP_SHAPE);
Iterable<Paragraph> paragraphs = parentShape.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph p : paragraphs)
{
System.out.println(p.toString(SaveFormat.TEXT));
}
}
}