MikeLak
September 17, 2018, 9:24am
1
Dear Team
I want to extract images with labels.I have attached the
sample documentSample3.zip (1.3 MB)
Expected output contains every images extracted in jpeg format seperately.
Thanks in advance.
@MikeLak
Thanks for your inquiry. In your case, we suggest you following solution.
Get the paragraph nodes using Document.GetChildNodes method.
Iterate over the paragraphs and get their text using Node.toString method.
If the text of paragraph starts with “Fig”, get the previous node using Node.PreviousSibling property.
If the previous node is Shape node, extract the node and import it into new document.
Please check the code example shared in your other thread here:
@MikeLak
Thanks for your inquiry. Please use the following code example to get the desired output.
Document doc = new Document(MyDir + "sample2.docx");
int i = 1;
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).contains("Fig")
|| paragraph.toString(SaveFormat.TEXT).contains("(a)")
|| paragraph.toString(SaveFormat.TEXT).contains("(b)"))
{ System.out.println(paragraph.get…
MikeLak
September 18, 2018, 3:59am
3
tahir.manzoor:
Document doc = new Document(MyDir + “sample2.docx”); int i = 1; for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) { if(paragraph.toString(SaveFormat.TEXT).contains(“Fig”) || paragraph.toString(SaveFormat.TEXT).contains("(a)") || paragraph.toString(SaveFormat.TEXT).contains("(b)")) { System.out.println(paragraph.getText()); if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() > 0) { Document dstDoc = new Document(); NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING); Node newNode = importer.importNode(paragraph, true); dstDoc.getFirstSection().getBody().appendChild(newNode); dstDoc.save(MyDir + “output”+i+".docx"); i++; } else if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0 && paragraph.getPreviousSibling() != null && paragraph.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH && ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true).getCount() > 0) { Document dstDoc = new Document(); NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING); Node newNode = importer.importNode((Paragraph)paragraph.getPreviousSibling(), true); dstDoc.getFirstSection().getBody().appendChild(newNode); dstDoc.save(MyDir + “output”+i+".docx"); i++; } } }
Hi @tahir.manzoor
Thanks for the reply.