Issue on Extraction of Images

priyanga · June 15, 2017, 1:07am

Hi Team,
I am able extract and save the images(jpeg,png) as pdf .Iam using paragraph node for extraction.but some of the images above the images type that will not getting extracted and able to save.

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable<Shape>)shapes)
{
    if (shape.hasImage() && shape.getParentParagraph().getNextSibling() != null
    && shape.getParentParagraph().getNextSibling().getNodeType() == NodeType.PARAGRAPH)
    {

        if (shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Fig")
        || shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Sch"))
        {
            caption = shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT);



            ArrayList nodes = extractContent(shape.getParentParagraph(), shape.getParentParagraph(), true);
            filename = folder_name + "Fig" + i + "_" + name + ".docx";
            generateDocument(doc, nodes).save(filename);

            Paragraph fig = (Paragraph)shape.getParentParagraph().getNextSibling();

            /**

				* REMOVAL OF NODE(START,END) FROM SOURCE WORD DOC START

				**/

            shape.getParentParagraph().insertBefore(new BookmarkStart(doc, "Image_" + i), shape);
            fig.appendChild(new BookmarkEnd(doc, "Image_" + i));

            i++;

I am using above code for extraction.

Thank you,
kind regards,
priyanga

tilal.ahmad · June 15, 2017, 10:52am

Hi Priyanga,

Thanks for your inquiry. I am afraid I am unable to test your code due to missing references. Please share your complete working code here, we will further investigate it and will guide you accordingly.

However, I have tested the image extraction scenario with following code snippet and noticed that an image is identified as unknown image type, so logged a ticket WORDSNET-15524 in our issue tracking system for further investigation and rectification. We will notify you as soon as it is resolved.

com.aspose.words.Document doc = new com.aspose.words.Document("test+(14).docx");
int i = 0;
// Get collection of shapes
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
// Loop through all shapes
for (Shape shape : shapes)
{
    if (shape.hasImage())
    {
        String imageFileName = ("Image.ExportImages_" + i++ + FileFormatUtil.imageTypeToExtension(shape.getImageData().getImageType()));
        shape.getImageData().save(imageFileName);
    }
}

Best Regards,