How to get the name of the images in doc in java

e503824 · March 30, 2022, 8:06am

Dear team,

we are calculating total images in the documents using java, we are using below conditions for filter the image name, please find below source code

private static void findAllfigures(Document initDoc, String nameAppend) throws NullPointerException {
    String matches = "Fig.*(?:[ \\r\\n\\t].*)+|Scheme.*|Plate.*|Abbildung.*";
    try
    {
        for (Paragraph para : (Iterable<Paragraph>)initDoc.getChildNodes(NodeType.PARAGRAPH, true))
        {
            if (para.getText().trim().startsWith(FIG) || para.getText().trim().startsWith(SCHEME)
                    || para.getText().trim().startsWith(PLATE))
            {
                if (!(para.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions")))
                {
                    try
                    {
                        if ((((Paragraph)para.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
                                .getCount() > 0) || (((Paragraph)para.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true)
                                .getCount() > 0))
                        {
                            String allFignames = null;
                            {
                                allFignames = formatImgcaption(para.toString(SaveFormat.TEXT).trim(), nameAppend);
                            }
                            allimages.add(allFignames);
                        }
                    }
                    catch (NullPointerException e)
                    {
                        logger.info("Exception ", e.getMessage());
                        e.printStackTrace();
                    }

                }
            }
        }
        initDoc.save(interim);
    }
    catch (Exception e)
    {
        logger.info("Exception ", e.getMessage());
        e.printStackTrace();
    }
}

current output : null pointer exception error

Expected output :

The allimages2 values are: 
allimages :Revised manuscript with no changes marked_Fig0020
allimages :Revised manuscript with no changes marked_Fig0006
allimages :Revised manuscript with no changes marked_Fig0017
allimages :Revised manuscript with no changes marked_Fig0005
allimages :Revised manuscript with no changes marked_Fig0016
allimages :Revised manuscript with no changes marked_Fig0004
allimages :Revised manuscript with no changes marked_Fig0015
allimages :Revised manuscript with no changes marked_Fig0003
allimages :Revised manuscript with no changes marked_Fig0014
allimages :Revised manuscript with no changes marked_Fig0002
allimages :Revised manuscript with no changes marked_Fig0013
allimages :Revised manuscript with no changes marked_Fig0001
allimages :Revised manuscript with no changes marked_Fig0012
allimages :Revised manuscript with no changes marked_Fig0011
allimages :Revised manuscript with no changes marked_Fig0010
allimages :Revised manuscript with no changes marked_Fig0009
allimages :Revised manuscript with no changes marked_Fig0008
allimages :Revised manuscript with no changes marked_Fig0019
allimages :Revised manuscript with no changes marked_Fig0007
allimages :Revised manuscript with no changes marked_Fig0018

Sample Input document : Revised manuscript with no changes marked.docx (7.9 MB)

alexey.noskov · March 30, 2022, 9:05am

@e503824 The problem is in this condition:

if ((((Paragraph)para.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
    .getCount() > 0) || (((Paragraph)para.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true)
    .getCount() > 0))

Images and their captions in your document are in table cells. Since the paragraph with caption is the only paragraph in the cell it does not have neither NextSibling nor PreviousSibling. If you need to check whether there is a shape, you can use condition like this:

Table parentTable = (Table)para.getAncestor(NodeType.TABLE);
if(parentTable!=null && parentTable.getChildNodes(NodeType.SHAPE, true).getCount()>0)