Dear team,
we are facing image extraction issue for below input document
input document : Figures.docx (901.5 KB)
we are using below conditions
if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung")
&& paragraph.getNodeType() != NodeType.TABLE)
// //changes by pavi -starts check sample D:\testing\AIE\Iteration 16_4 points\Document contains Duplicate figure captions\Revised-MANUSCRIPT
&& ((paragraph.getNextSibling() != null
&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE)
|| paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
.matches(matches))
// && paragraph.getNextSibling().getNodeType() != NodeType.TABLE
//changes by pavi -end
&& paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
&& !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)//duplicate caption by pavi
&& !(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions")) ||
!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))
{
please do needful