Image extraction issue

e503824 · April 6, 2022, 4:21am

Dear team,

we are facing image extraction issue for below input document

input document : Figures.docx (901.5 KB)

we are using below conditions

if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
    || paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung")
            && paragraph.getNodeType() != NodeType.TABLE)
    //						//changes by pavi -starts check sample  D:\testing\AIE\Iteration 16_4 points\Document contains Duplicate figure captions\Revised-MANUSCRIPT
    && ((paragraph.getNextSibling() != null
    && paragraph.getNextSibling().getNodeType() != NodeType.TABLE)
    || paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
            .matches(matches))
    //	&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE
    //changes by pavi -end 
    && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
    && !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
    && !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)//duplicate caption by pavi
    && !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions")) ||
        !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))
{

please do needful

alexey.noskov · April 6, 2022, 5:26am

A post was merged into an existing topic: Extraction issue using Java

alexey.noskov · April 6, 2022, 5:26am