Image Extraction Information

rivera_ericbah · July 19, 2011, 8:51am

I am currently looking into your product to extract images
from a Word (doc & docx) and pdf documents. I am able to extract
images from a .doc and .pdf documents but I am unable to extract images
from a .docx document. I also need to obtain the document page
number the image is located. Is this possible? Can I get a code
snippet to accomplish this for both word and pdf documents?

Thanks

alexey.noskov · July 19, 2011, 10:50am

Hi

Thanks for your request. In Docx document images are represented as DrawingML objects, so instead of integrating through Shape nodes, you have to integrate through DrawingML nodes. But the technique is exactly the same as described here, the only difference is NodeType:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/howto-extract-images-from-a-document.html

Unfortunately, there is no way to get number of page where the image is located. MS Word document is flow document, i.e. it does not contain any information about its layout into pages.

Best regards,