Drawing tool image extraction without table and get size with and without caption

Chapter09.docx (468.6 KB)

@mohamed.sathakathullah

Cause

To extract images from a Word document without including tables and to retrieve their sizes along with captions, you need to navigate through the document’s structure effectively. The images are typically stored in the document’s shapes or inline shapes, and captions are associated with these images.

Solution

You can use Aspose.Words for Java to achieve this. The process involves iterating through the document’s nodes, checking for shapes or inline shapes, and then extracting the images and their associated captions.

Code

Here’s a sample code snippet to help you get started:

import com.aspose.words.*;

public class ImageExtractor {
    public static void main(String[] args) throws Exception {
        Document doc = new Document("Chapter09.docx");
        NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
        NodeCollection inlineShapes = doc.getChildNodes(NodeType.INLINE_SHAPE, true);

        for (Shape shape : (Iterable<Shape>) shapes) {
            if (shape.getImageData() != null) {
                // Extract image
                String imageFileName = "Image_" + shape.getName() + ".png";
                shape.getImageData().save(imageFileName);
                System.out.println("Saved image: " + imageFileName);
                
                // Get caption if exists
                if (shape.getTitle() != null) {
                    System.out.println("Caption: " + shape.getTitle());
                }
            }
        }

        for (InlineShape inlineShape : (Iterable<InlineShape>) inlineShapes) {
            if (inlineShape.getImageData() != null) {
                // Extract image
                String imageFileName = "InlineImage_" + inlineShape.getName() + ".png";
                inlineShape.getImageData().save(imageFileName);
                System.out.println("Saved inline image: " + imageFileName);
                
                // Get caption if exists
                if (inlineShape.getTitle() != null) {
                    System.out.println("Caption: " + inlineShape.getTitle());
                }
            }
        }
    }
}

Next Steps

  1. Ensure you have the Aspose.Words library added to your project.
  2. Run the code with your document to extract images and captions.
  3. Modify the code as needed to filter out any unwanted elements or to customize the output format.

This approach will effectively extract images and their captions from your Word document. If you have any further questions or need additional assistance, feel free to ask!

INLINE_SHAPE cannot be resolved or is not a field

@mohamed.sathakathullah You can use LayoutCollector and LayoutEnumerator to calculate coordinates of shapes.

Document doc = new Document("C:\\Temp\\in.docx");
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
for (Shape s : (Iterable<Shape>)doc.getChildNodes(NodeType.SHAPE, true))
{
    // LayoutCollector and LayoutEnumerator work only with nodes in the main document body
    // Only top level shapes van be processed
    if (!s.isTopLevel() || s.getAncestor(NodeType.HEADER_FOOTER) != null)
        continue;

    enumerator.setCurrent(collector.getEntity(s));
    System.out.println("Page " + enumerator.getPageIndex() + "; Shape Rect : " + enumerator.getRectangle());
}