Chapter09.docx (468.6 KB)
Cause
To extract images from a Word document without including tables and to retrieve their sizes along with captions, you need to navigate through the document’s structure effectively. The images are typically stored in the document’s shapes or inline shapes, and captions are associated with these images.
Solution
You can use Aspose.Words for Java to achieve this. The process involves iterating through the document’s nodes, checking for shapes or inline shapes, and then extracting the images and their associated captions.
Code
Here’s a sample code snippet to help you get started:
import com.aspose.words.*;
public class ImageExtractor {
public static void main(String[] args) throws Exception {
Document doc = new Document("Chapter09.docx");
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
NodeCollection inlineShapes = doc.getChildNodes(NodeType.INLINE_SHAPE, true);
for (Shape shape : (Iterable<Shape>) shapes) {
if (shape.getImageData() != null) {
// Extract image
String imageFileName = "Image_" + shape.getName() + ".png";
shape.getImageData().save(imageFileName);
System.out.println("Saved image: " + imageFileName);
// Get caption if exists
if (shape.getTitle() != null) {
System.out.println("Caption: " + shape.getTitle());
}
}
}
for (InlineShape inlineShape : (Iterable<InlineShape>) inlineShapes) {
if (inlineShape.getImageData() != null) {
// Extract image
String imageFileName = "InlineImage_" + inlineShape.getName() + ".png";
inlineShape.getImageData().save(imageFileName);
System.out.println("Saved inline image: " + imageFileName);
// Get caption if exists
if (inlineShape.getTitle() != null) {
System.out.println("Caption: " + inlineShape.getTitle());
}
}
}
}
}
Next Steps
- Ensure you have the Aspose.Words library added to your project.
- Run the code with your document to extract images and captions.
- Modify the code as needed to filter out any unwanted elements or to customize the output format.
This approach will effectively extract images and their captions from your Word document. If you have any further questions or need additional assistance, feel free to ask!
INLINE_SHAPE cannot be resolved or is not a field
@mohamed.sathakathullah You can use LayoutCollector and LayoutEnumerator to calculate coordinates of shapes.
Document doc = new Document("C:\\Temp\\in.docx");
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
for (Shape s : (Iterable<Shape>)doc.getChildNodes(NodeType.SHAPE, true))
{
// LayoutCollector and LayoutEnumerator work only with nodes in the main document body
// Only top level shapes van be processed
if (!s.isTopLevel() || s.getAncestor(NodeType.HEADER_FOOTER) != null)
continue;
enumerator.setCurrent(collector.getEntity(s));
System.out.println("Page " + enumerator.getPageIndex() + "; Shape Rect : " + enumerator.getRectangle());
}