Extract image figure caption beside the image

Dear Team,

We need to extract the image from source document, using figure caption beside the image.

I’ve attached sample document for your reference.

Sample : Capture.zip (95.7 KB)

Input : Sample.zip (1.3 MB)

Thank you.

@ssvel

Thanks for your inquiry. In your case, we suggest you following solution.

  1. Iterate over all paragraphs. You can get the paragraphs nodes using Document.GetChildNodes.
  2. If the text of paragraph starts with “Fig” and its parent node is Shape, get the parent node of Shape that is Paragraph node e.g. it is paragraph1. You can get the paragraph’s text using Node.ToString method and get the ancestor node using Node.GetAncestor method.
  3. Get the child nodes of type Shape from the paragraph1 and their text do not start with “Fig”.
  4. Use NodeImporter.ImportNode method to import the child nodes (shapes) into new document.

@tahir.manzoor

Thanks for your valuable comments. Please give some sample source for the above scenarios.

Thank you.

@ssvel

Thanks for your inquiry. Please spare us some time for the code example. We will get back to you soon.

@ssvel

Please use the following code example to get the images that have “Fig” caption inside another shape.

Document doc = new Document(MyDir + "sample.docx");

DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig") && paragraph.getParentNode() != null && paragraph.getParentNode().getNodeType() == NodeType.SHAPE) {
        Document dstDoc = new Document();
        NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
        Node importNode = importer.importNode(paragraph.getParentNode().getParentNode(), true);
        dstDoc.getFirstSection().getBody().appendChild(importNode);

        if (dstDoc.getChildNodes(NodeType.SHAPE, true).getCount() > 1) {
            for (Shape shp : (Iterable<Shape>) dstDoc.getChildNodes(NodeType.SHAPE, true)) {
                if (shp.toString(SaveFormat.TEXT).trim().contains("Fig"))
                    shp.remove();
            }
            dstDoc.save(MyDir + "output" + i + ".docx");
            i++;
        }
    }
}

@tahir.manzoor

Thanks for sharing the code. Its working fine.

Thank you.

@ssvel

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.