We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Need to Extract images in docx file using java

Dear team,

How to extract images without figure captions using aspose java

@e503824 Images in the document are represented by Shape nodes, so you can loop through Shape nodes and check whether Shape has image:

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape s : shapes)
{
    if (s.hasImage())
    {
        // Do something with shape.
    }
}

dear team,

We will face many types of figures, and we are using below conditions for finding caption figures

if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
        || paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
        || paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
        || paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
        || paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung"))
        // for duplicate figure caption it-15
        && (paragraph.getNextSibling() != null
                && !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
                || (paragraph.getNextSibling() != null
                        && paragraph.getNextSibling().getNodeType() != NodeType.TABLE
                        && paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
                        && (((Paragraph)paragraph.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
                                .getCount() > 0
                                || (paragraph.getNextSibling().getNextSibling()) != null
                                        && paragraph.getNextSibling().getNextSibling()
                                                .getNodeType() != NodeType.TABLE
                                        && ((((Paragraph)paragraph.getNextSibling().getNextSibling())
                                                .getChildNodes(NodeType.SHAPE, true).getCount() == 0)

                                                //this codition added by pavi-14-12-2021   for duplicate captions
                                                || (((Paragraph)paragraph.getNextSibling().getNextSibling())
                                                        .getChildNodes(NodeType.SHAPE, true).getCount() > 0))))
                || paragraph.getParentSection().getBody().getLastParagraph().getText().trim()
                        .matches(matches))
        // for duplicate figure caption
        && ((paragraph.getPreviousSibling() != null
                && paragraph.getPreviousSibling().getNodeType() != NodeType.TABLE)
                || paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
                        .matches(matches))
        && paragraph.getNodeType() != NodeType.TABLE
        && paragraph.getParentNode().getNodeType() != NodeType.CELL
        && !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)

        //condition added by pavi -14-12-2021
        && (!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions")) ||
                !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))

        || ((paragraph.getNextSibling() == null) && (builder.getCurrentParagraph().isEndOfDocument())))
{
}

@e503824 Caption is a simple paragraph, so once you have detected the caption, you can assume that image is either in the previous paragraph (if the caption is under the image), or in the next paragraph (if the caption is below the image), or in previous or next row if image and caption is in the table. All depends on the structure of your documents.