Extraction of images using legends

ssvel · August 10, 2017, 12:29pm

Hi Team,

My requirement is to extracting the images using paragraph nodes and fig caption.but some the documents having images without fig caption instead of it using legends like (a),(b).let me know how to extract the images using legends.

I have enclosed the source code isNewAip.zip (58.5 KB)

The expected output is Expected_Output.zip (237.7 KB)

Thanks & Regards
vadivel

tahir.manzoor · August 10, 2017, 5:48pm

@Vadivel_S_S,

Thanks for your inquiry. Please ZIP and attach your input Word document here for testing. We will investigate the issue on our side and provide you more information.

ssvel · August 11, 2017, 3:50am

Hi @tahir,
Thank You very much .

The input document is test (8).zip (2.7 MB)

Thanks & regards,
vadivel s.s

tahir.manzoor · August 11, 2017, 7:23am

@Vadivel_S_S,

Thanks for sharing the document. The “legends like (a),(b)” are list items in your document. Please use following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "test (8).docx");
doc.updateListLabels();
int i = 1;
ArrayList nodes = new ArrayList();
//Get the paragraphs that start with "(a)".
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.getListFormat().isListItem())
    {
        if(paragraph.getListLabel().getLabelString().trim().startsWith("(a)") ||
                paragraph.getListLabel().getLabelString().trim().startsWith("(b)") ||
                paragraph.getListLabel().getLabelString().trim().startsWith("(c)"))
        {

            Node previousPara = paragraph.getPreviousSibling();
            while (previousPara != null
                    && previousPara.getNodeType() == NodeType.PARAGRAPH
                    && previousPara.toString(SaveFormat.TEXT).trim().length() == 0
                    && ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                if(previousPara != null)
                    nodes.add(previousPara);
                previousPara = previousPara.getPreviousSibling();
            }

            if(nodes.size() > 0)
            {
                //Reverse the node collection.
                Collections.reverse(nodes);

                //Extract the consecutive shapes and export them into new document
                Document dstDoc = new Document();
                for (Paragraph para : (Iterable<Paragraph>)nodes)
                {
                    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                    Node newNode = importer.importNode(para, true);
                    dstDoc.getFirstSection().getBody().appendChild(newNode);
                }
                //Remove the first empty paragraph
                if(dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
                    dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
                dstDoc.save(MyDir + "out\\output"+i+".docx");
                i++;
                nodes.clear();
            }
        }
    }
}

ssvel · August 18, 2017, 8:58am

Its working fine tahir. Thank you so much.

Regards.,

Vadivel S S

tahir.manzoor · August 18, 2017, 10:09am

@Vadivel_S_S,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.