Part images

jan.kathir · November 6, 2019, 12:21pm

Dear Team,
Can i get a workaround solution to extract part images with fig caption in java? or can i get an idea to extract part images.

Part images :: Images that do not contain legends(i.e… (a) , (b) ,( c ) ,…) below the images, but contains images one below or beside another images within single fig caption below

Attached sample :
part.zip (333.0 KB)
Thanks…

tahir.manzoor · November 6, 2019, 2:42pm

@jan.kathir

Please ZIP and attach your expected output Word documents here for our reference. We will then provide you more information about your query along with code example.

jan.kathir · November 7, 2019, 4:27am

Hi
@tahir.manzoor
I have attached sample output below for the above input. Please find the attachment.

Attached Sample_output:Sample output.zip (347.0 KB)

tahir.manzoor · November 7, 2019, 10:56am

@jan.kathir

We are working over your query and will share the code example with you soon.

tahir.manzoor · November 7, 2019, 4:13pm

@jan.kathir

Please use the following code example to extract the shapes from the document. Hope this helps you.

Document doc = new Document(MyDir + "part.docx");
doc.updateListLabels();
int i = 1;
ArrayList nodes = new ArrayList();

for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node previousPara = paragraph.getPreviousSibling();
        while (previousPara != null
                && previousPara.getNodeType() == NodeType.PARAGRAPH
                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0)
        {
            if(previousPara != null)
                nodes.add(previousPara);
            previousPara = previousPara.getPreviousSibling();
        }

        if(nodes.size() > 0)
        {
            //Reverse the node collection.
            Collections.reverse(nodes);

            //Extract the consecutive shapes and export them into new document
            Document dstDoc = new Document();
            for (Paragraph para : (Iterable<Paragraph>)nodes)
            {
                NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                Node newNode = importer.importNode(para, true);
                dstDoc.getFirstSection().getBody().appendChild(newNode);
            }
            //Remove the first empty paragraph
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.save(MyDir + "output"+i+".docx");
            i++;
            nodes.clear();
        }
    }
}