Extracting drawing tool and text box tool imagesin word document

Hi Team,

we have extract the images from document using paragraph node .but some of the images are not extracted.The images drawn by using drawing tools and text box tools .i have attached the input sample.please kindly provide the work around solution fro this problem.
Figures_drawing tool images.zip (25.9 KB)

Thanks and regards ,
priyanga G

@priyanga

Thanks for your inquiry. Please note the image in your shared document is GroupShape. You can extract it as following. You can customize the code as per your requirement.

com.aspose.words.Document doc = new com.aspose.words.Document(
		"Figures_drawing tool images.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int imageIndex = 1;
// Get collection of shapes
NodeCollection<GroupShape> Gshapes = doc.getChildNodes(
		NodeType.GROUP_SHAPE, true);
// Loop through all Group Shapes
for (GroupShape shape : Gshapes) {
	ImageSaveOptions imageOptions = new ImageSaveOptions(
			com.aspose.words.SaveFormat.JPEG);

	shape.getShapeRenderer().save(
			"Groupimage" + imageIndex + ".jpeg", imageOptions);

}

Hi @tilal.ahmad,

Thank you very much .The output was nice.

but Iam not able to save this as .DOCX format.please let me know how to render shape in docx format.

Thanks
&
Regards,
priyangaG

@priyanga

Thanks for your feedback. You may save image to stream and insert it into new document. Please check following code snippet, Hopefully it will help you to accomplish the task.

Document doc = new Document("Figures_drawing tool images.docx");
int imageIndex = 1;
// Get collection of shapes
NodeCollection<GroupShape> Gshapes = doc.getChildNodes(NodeType.GROUP_SHAPE, true);
// Loop through all Group Shapes
for (GroupShape shape : Gshapes) {
	ByteArrayOutputStream imageStream = new ByteArrayOutputStream();
	ImageSaveOptions imageOptions = new ImageSaveOptions(SaveFormat.JPEG);
	shape.getShapeRenderer().save(imageStream, imageOptions);
	// Save image to new document.
    Document imagedoc = new Document();
	DocumentBuilder builder = new DocumentBuilder(imagedoc);
	builder.insertImage(imageStream.toByteArray());
	imagedoc.save("Output_"+ imageIndex +".docx");
}

Hi @tilal.ahmad,

Thank you for your timely help.Now i am able to saved docx file.

In my document having more group shape images. so that i am increasing the image Index value as image Index++ . But the same image only duplicated as many. rest of the group images are not extracted.

Please, let me know how to read the fig caption and also solve the duplicate images.

Thanks & regards
priyanga G

@priyanga

Thanks for your feedback. Please share your input document, we will look into it and will guide you accordingly about duplication issue.

In reference to read the Figure caption. You can use same code that you are using to get caption of charts. In your shared sample document Figure caption is appearing before the group image so you can get PreviousSibling of group image for the purpose.

if (shape.getParentParagraph().getPreviousSibling().toString(SaveFormat.TEXT).startsWith("Fig")) 
		 { caption = shape.getParentParagraph().getPreviousSibling().toString(SaveFormat.TEXT).trim();}
ByteArrayOutputStream imageStream = new ByteArrayOutputStream();
ImageSaveOptions imageOptions = new ImageSaveOptions(SaveFormat.JPEG);
shape.getShapeRenderer().save(imageStream, imageOptions);

// Save image to new document.
Document imagedoc = new Document();
DocumentBuilder builder = new DocumentBuilder(imagedoc);
builder.insertImage(imageStream.toByteArray());

//imagedoc.save("output_"+imageIndex++ +".docx");
imagedoc.save(caption +".docx");

HI @tilal.ahmad

Thank you very much for giving the solution.

The input document is tested.zip (2.2 MB)

The output folder having single image is extracted in separate document. The processed output is Outputfolder.zip (718.2 KB)

The expected output is folder sample.zip (2.6 MB)

Thanks & Regards.,
Priyanga

@priyanga

Thanks for sharing the source document. You may render Group shapes as images in the document and then extract the images from the document. Please check following sample code for reference. Hopefully it will help you to accomplish the task.

Document doc = new Document("tested.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int imageIndex = 1;
// Get collection of Group shapes
NodeCollection<GroupShape> Gshapes = doc.getChildNodes(	NodeType.GROUP_SHAPE, true);
// Loop through all Group Shapes
for (GroupShape shape : Gshapes) {
	// Save Group shape as image
	ByteArrayOutputStream imageStream = new ByteArrayOutputStream();
	ImageSaveOptions imageOptions = new ImageSaveOptions(SaveFormat.JPEG);
	shape.getShapeRenderer().save(imageStream, imageOptions);
	builder.moveTo(shape);
	builder.insertImage(imageStream.toByteArray());
	shape.remove();
}
ByteArrayOutputStream docStream = new ByteArrayOutputStream();
doc.save(docStream, SaveFormat.DOCX);

doc = new Document(new ByteArrayInputStream(docStream.toByteArray()));
builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable<Shape>) shapes) {
	if (shape.hasChart() || shape.hasImage()) {
		Document dstDoc = new Document();
		NodeImporter importer = new NodeImporter(doc, dstDoc,ImportFormatMode.KEEP_SOURCE_FORMATTING);
		Node newNode = importer.importNode(shape, true);
		dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
		dstDoc.save("Output_" + i + ".docx");
		i++;
	}
}

Hi @tilal.ahmad,

Thank you very much .

Now i am able to get exact output .

Thanks & regards,
priyanga.G

Hi @tilal.ahmad

The requirement is to extract the image by using fig caption. How to read the fig caption then extract the images and saved into new word document.

Thanks
&
regards,

Priyanga G

@priyanga,

Please refer to the following article to learn how to work with Aspose.Words’ Document Object Model (DOM).

Aspose.Words Document Object Model
Document Tree Navigation

If you open your tested.doc with Document Explorer (see Document-Explorer-View.png (36.9 KB)), you will notice that the figure captions are present inside separate Paragraph nodes. So, you need to loop through those Paragraphs and for each Paragraph build logic to ascend up the Document Node hierarchy to find the related Shape/GroupShape. This task would be much simpler if the Shapes and related caption Paragraphs were Bookmarked and you would simply achieve this task by using the code mentioned in following article:

Extract Content from a Bookmark

Best regards,