We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Image extraction issue 2

Dear team,

We are extracting images from docx using aspose java, but in this case some images having image caption and few images don’t have image captions, how to extract with our caption images using aspose, please find source code and input file

if ((paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
	|| paragraph.toString(SaveFormat.TEXT).startsWith("Scheme")
	|| paragraph.toString(SaveFormat.TEXT).startsWith("Plate")
	|| paragraph.toString(SaveFormat.TEXT).startsWith("Abb")
	|| paragraph.toString(SaveFormat.TEXT).startsWith("Abbildung"))
	// for duplicate figure caption it-15
	&& (paragraph.getNextSibling() != null
			&& !paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
			|| (paragraph.getNextSibling() != null
					&& paragraph.getNextSibling().getNodeType() != NodeType.TABLE
					&& paragraph.getNextSibling().toString(SaveFormat.TEXT).trim().matches(matches)
				&& (((Paragraph) paragraph.getNextSibling()).getChildNodes(NodeType.SHAPE, true)
						.getCount() > 0
							|| (paragraph.getNextSibling().getNextSibling()) != null
									&& paragraph.getNextSibling().getNextSibling()
											.getNodeType() != NodeType.TABLE
									&& ((((Paragraph) paragraph.getNextSibling().getNextSibling())
											.getChildNodes(NodeType.SHAPE, true).getCount() == 0)
											
											//this codition added by pavi-14-12-2021   for duplicate captions
											||(((Paragraph) paragraph.getNextSibling().getNextSibling())
													.getChildNodes(NodeType.SHAPE, true).getCount() > 0))))
			|| paragraph.getParentSection().getBody().getLastParagraph().getText().trim()
	.matches(matches))
	// for duplicate figure caption
	&& ((paragraph.getPreviousSibling() != null
			&& paragraph.getPreviousSibling().getNodeType() != NodeType.TABLE)
			|| paragraph.getParentSection().getBody().getFirstParagraph().getText().trim()
				.matches(matches))
	&& paragraph.getNodeType() != NodeType.TABLE
	&& paragraph.getParentNode().getNodeType() != NodeType.CELL
	&& !paragraph.toString(SaveFormat.TEXT).contains(AIE.docName)

	//condition added by pavi -14-12-2021
	&& (!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure Captions"))||
			!(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figures")))
	
       || ((paragraph.getNextSibling() == null) && (builder.getCurrentParagraph().isEndOfDocument())))

input : Revised Manuscript.docx (4.1 MB)

@e503824 To extract all images from the document you can use the following simple code:

Document doc = new Document("C:\\Temp\\in.docx");
Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
for(Shape s : shapes)
{
    // Import shape into another document and save.
}