Extract display images from the document

Hi Team,

My requirement is to extract the display images from the document and save into new document.please,kindly help me to solve the issue.

source document: 34.zip (397.3 KB)

expected output:expectedoutput.zip (65.5 KB)

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. Please use the following code example to extract the image from the document and insert it into new document.

Document doc = new Document(MyDir + "34.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;

for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true))
{
    if(!shape.hasImage())
        continue;
    Document dstDoc = new Document();
    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(shape, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
    dstDoc.save(MyDir + "output"+i+".docx");
    i++;
}

Hi @tahir.manzoor,

Thank you very much .

It’s working fine.

Other Issue: I want to extract the display images only.but in previous code extract the figure 1 with the display image.please let me know how to ignore the images with fig caption.

input: 34.zip (397.3 KB)

actual output:actual output.zip (344.0 KB)

expected output:expectedoutput.zip (293.7 KB)

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. You can ignore the Fig caption using following code example. Hope this helps you.

Document doc = new Document(MyDir + "34.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;

for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true))
{
    if(!shape.hasImage())
        continue;

    if(shape.getParentParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig")
            || (shape.getParentParagraph().getNextSibling()!= null && shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
            )
        continue;
    Document dstDoc = new Document();
    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(shape, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
    dstDoc.save(MyDir + "output"+i+".docx");
    i++;
}

Hi @tahir.manzoor,

Thank you very much.

I am able to get exact output.

Thanks & regards,
Priyanga G