We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Issue on Extraction of Images

Hi Team,

I am able extract and save the images(jpeg,png) as pdf .Iam using paragraph node for extraction.but some of the images above the images type that will not getting extracted and able to save.

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable) shapes) {
if (shape.hasImage() && shape.getParentParagraph().getNextSibling() != null
&& shape.getParentParagraph().getNextSibling().getNodeType() == NodeType.PARAGRAPH) {
				if (shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Fig")
						|| shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Sch")) {
					caption = shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT);</div><div><div>rrayList nodes = extractContent(shape.getParentParagraph(), shape.getParentParagraph(), true);</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>filename = folder_name + "Fig" + i + "_" + name + ".docx";</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>generateDocument(doc, nodes).save(filename);</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>Paragraph fig = (Paragraph) shape.getParentParagraph().getNextSibling();</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>/**</div><div><span class="Apple-tab-span" style="white-space:pre">						</span> * REMOVAL OF NODE(START,END) FROM SOURCE WORD DOC START</div><div><span class="Apple-tab-span" style="white-space:pre">						</span> **/</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>shape.getParentParagraph().insertBefore(new BookmarkStart(doc, "Image_" + i), shape);</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>fig.appendChild(new BookmarkEnd(doc, "Image_" + i));</div><div><span class="Apple-tab-span" style="white-space:pre">						</span>i++;</div></div><div>I am using above code for extraction.</div><div><br></div><div><br></div><div>Thank you,</div><div>kind regards,</div><div>priyanga</div>

Hi Priyanga,


Thanks for your inquiry. I am afraid I am unable to test your code due to missing references. Please share your complete working code here, we will further investigate it and will guide you accordingly.

However, I have tested the image extraction scenario with following code snippet and noticed that an image is identified as unknown image type, so logged a ticket WORDSNET-15524 in our issue tracking system for further investigation and rectification. We will notify you as soon as it is resolved.

com.aspose.words.Document doc = new
com.aspose.words.Document(“test+(14).docx”);<o:p></o:p>

int i = 0;

// Get collection of shapes

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);

// Loop through all shapes

for (Shape shape : shapes)

{

if (shape.hasImage())

{

String imageFileName = ("Image.ExportImages_"+ i++ + FileFormatUtil.imageTypeToExtension(shape.getImageData().getImageType()));

shape.getImageData().save(imageFileName);

}

}


Best Regards,