Issues with retrieving embedded images from word document

Encountered an issues while retrieving .bmp image from a word document (*.doc).


Image is being inserted into word document as an object (not as picture…see sample.doc attached). We are using aspose word extractors to retrieve the image, but are able to see that only the icon is retrieved instead of the image itself.

This case is happening only for .bmp format and for all other image formats, the icons and images are extracted properly.

Can you please look into the issue in priority and reply ?

Thanks

Hi Afia,


Thanks for your inquiry.

The OLE object in your document is actually of type Paint.Picture. The problem occurs because this OLE object in your document is not embedded; instead, it is linked to an external BMP file. I think, in this case, you can simply get the path of the source BMP file for the linked OLE object, load it using ImageIO class and re-save (extract) to any other location. Please see the following code snippet:
Document doc = new Document(“C:\Temp\in.docx”);

int i = 0;
for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true)) {
if (shape.getOleFormat() != null) {
if (shape.getOleFormat().getProgId().equals(“Paint.Picture”) && shape.getOleFormat().isLink()) {
BufferedImage image = ImageIO.read(new File(shape.getOleFormat().getSourceFullName()));
ImageIO.write(image, “bmp”, new File(“C:\temp\img” + i + “.bmp”));
i++;
}
}
}


I hope, this helps.

Best regards,