Extracting images with labels a,b

MikeLak · September 17, 2018, 9:24am

Dear Team

I want to extract images with labels.I have attached the
sample documentSample3.zip (1.3 MB)
Expected output contains every images extracted in jpeg format seperately.
Thanks in advance.

tahir.manzoor · September 17, 2018, 3:37pm

@MikeLak

Thanks for your inquiry. In your case, we suggest you following solution.

Get the paragraph nodes using Document.GetChildNodes method.
Iterate over the paragraphs and get their text using Node.toString method.
If the text of paragraph starts with “Fig”, get the previous node using Node.PreviousSibling property.
If the previous node is Shape node, extract the node and import it into new document.

Please check the code example shared in your other thread here:

MikeLak · September 18, 2018, 3:59am

tahir.manzoor:

Document doc = new Document(MyDir + “sample2.docx”); int i = 1; for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) { if(paragraph.toString(SaveFormat.TEXT).contains(“Fig”) || paragraph.toString(SaveFormat.TEXT).contains("(a)") || paragraph.toString(SaveFormat.TEXT).contains("(b)")) { System.out.println(paragraph.getText()); if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() > 0) { Document dstDoc = new Document(); NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING); Node newNode = importer.importNode(paragraph, true); dstDoc.getFirstSection().getBody().appendChild(newNode); dstDoc.save(MyDir + “output”+i+".docx"); i++; } else if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0 && paragraph.getPreviousSibling() != null && paragraph.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH && ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true).getCount() > 0) { Document dstDoc = new Document(); NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING); Node newNode = importer.importNode((Paragraph)paragraph.getPreviousSibling(), true); dstDoc.getFirstSection().getBody().appendChild(newNode); dstDoc.save(MyDir + “output”+i+".docx"); i++; } } }

Hi @tahir.manzoor
Thanks for the reply.