Dear team,
we are extracting images from docx, In this case we are notable to extract in a single pdf, please find the source code which we are using and input docx
if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
|| paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung"))
// for duplicate figure caption it-15
&& (paragraph.getNextSibling() != null
&& !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
|| (paragraph.getNextSibling() != null
&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE
&& paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
&& (((Paragraph) paragraph.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
.getCount() > 0
|| (paragraph.getNextSibling().getNextSibling()) != null
&& paragraph.getNextSibling().getNextSibling()
.getNodeType() != NodeType.TABLE
&& ((((Paragraph) paragraph.getNextSibling().getNextSibling())
.getChildNodes(NodeType.SHAPE, true).getCount() == 0)
//this codition added by pavi-14-12-2021 for duplicate captions
||(((Paragraph) paragraph.getNextSibling().getNextSibling())
.getChildNodes(NodeType.SHAPE, true).getCount() > 0))))
|| paragraph.getParentSection().getBody().getLastParagraph().getText().trim()
.matches(matches))
// for duplicate figure caption
&& ((paragraph.getPreviousSibling() != null
&& paragraph.getPreviousSibling().getNodeType() != NodeType.TABLE)
|| paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
.matches(matches))
&& paragraph.getNodeType() != NodeType.TABLE
&& paragraph.getParentNode().getNodeType() != NodeType.CELL
&& !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)
//condition added by pavi -14-12-2021
&& (!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions"))||
!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))
|| ((paragraph.getNextSibling() == null) && (builder.getCurrentParagraph().isEndOfDocument())))
input : JECHEM-D-22-00426R2.docx (836.8 KB)
output : JECHEM-D-22-00426R2_Fig0001.pdf (187.0 KB)
JECHEM-D-22-00426R2_Fig0003.pdf (74.4 KB)
JECHEM-D-22-00426R2_Fig0005.pdf (65.6 KB)
JECHEM-D-22-00426R2_Fig0007.pdf (85.6 KB)
JECHEM-D-22-00426R2_Fig0008.pdf (235.4 KB)
JECHEM-D-22-00426R2_Scheme0001.pdf (110.1 KB)
please do needful