Extraction of images from document

Hi Team,

My requirement is to extract images based on figure caption.please kindly help me to extract the images from the word document.

Source:sample.zip (1.9 MB)

Expected Output: Expected Output.zip (594.1 KB)

Thanks & regards,
priyanga G

@priyanga,

Sorry for the delay. We are working on this inquiry. We will update you soon about our findings.

@priyanga,

Please try using the following code. Hope, this helps.

Document doc = new Document("D:\\temp\\sample\\sample.doc");

int i = 0;
for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true)) {
    ShapeRenderer renderer = shape.getShapeRenderer();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    renderer.save(baos, new ImageSaveOptions(SaveFormat.JPEG));

    Document temp = new Document();
    DocumentBuilder builder = new DocumentBuilder(temp);
    builder.insertImage(baos.toByteArray());

    temp.save("D:\\Temp\\sample\\fig-" + i + " .docx");
    i++;
}