Extract shapes/images using paragraph node in JAVA

priyadharshini · June 22, 2017, 6:31am

Hi team,
Requiring a solution to extract images using paragraph nodes in separate documents for each image with a filter like image caption starting with “Fig”

Thanking Youextract.zip (1.6 MB)extractprob.zip (1.5 MB)

tahir.manzoor · June 22, 2017, 10:40am

Hi Priya,

Thanks for your inquiry. Please use following code example to export the shapes into new document. Hope this helps you.

Document doc = new Document(MyDir + "extractproblem.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure")
            && paragraph.getPreviousSibling() != null
            &&  paragraph.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH
            &&  ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
    {
        Document dstDoc = new Document();
        NodeCollection shapes = ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true);
        for (Shape shape : (Iterable<Shape>) shapes)
        {
            NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            Node newNode = importer.importNode(shape, true);
            dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
            dstDoc.save(MyDir + "output"+i+".docx");
            i++;
        }
    }
}

priyadharshini · June 29, 2017, 8:16am

Thank You

Regards
Priya Dharshini J P

tahir.manzoor · June 29, 2017, 11:09am