Extract display images from the document

priyanga · March 15, 2018, 10:22am

Hi Team,

My requirement is to extract the display images from the document and save into new document.please,kindly help me to solve the issue.

source document: 34.zip (397.3 KB)

expected output:expectedoutput.zip (65.5 KB)

Thanks & regards,
priyanga G

tahir.manzoor · March 15, 2018, 3:41pm

@priyanga,

Thanks for your inquiry. Please use the following code example to extract the image from the document and insert it into new document.

Document doc = new Document(MyDir + "34.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;

for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true))
{
    if(!shape.hasImage())
        continue;
    Document dstDoc = new Document();
    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(shape, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
    dstDoc.save(MyDir + "output"+i+".docx");
    i++;
}

priyanga · March 16, 2018, 4:20am

Hi @tahir.manzoor,

Thank you very much .

It’s working fine.

Other Issue: I want to extract the display images only.but in previous code extract the figure 1 with the display image.please let me know how to ignore the images with fig caption.

input: 34.zip (397.3 KB)

actual output:actual output.zip (344.0 KB)

expected output:expectedoutput.zip (293.7 KB)

Thanks & regards,
priyanga G

tahir.manzoor · March 16, 2018, 4:39am

@priyanga,

Thanks for your inquiry. You can ignore the Fig caption using following code example. Hope this helps you.

Document doc = new Document(MyDir + "34.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;

for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true))
{
    if(!shape.hasImage())
        continue;

    if(shape.getParentParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig")
            || (shape.getParentParagraph().getNextSibling()!= null && shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
            )
        continue;
    Document dstDoc = new Document();
    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(shape, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
    dstDoc.save(MyDir + "output"+i+".docx");
    i++;
}

priyanga · April 19, 2018, 8:21am

Hi @tahir.manzoor,

Thank you very much.

I am able to get exact output.

Thanks & regards,
Priyanga G