Extraction of images using java


#1

Source_2.zip (1.4 MB)
Source_3.zip (2.6 MB)
Source_4.zip (1.7 MB)ExpectedOuput_1.zip (1.2 MB)
ExpectedOuput_2.zip (1.3 MB)
ExpectedOuput_3.zip (1.4 MB)
ExpectedOuput_4.zip (1.2 MB)
ExpectedOuput_5.zip (1.7 MB)

Hi team,

Requesting a workaround solution to extract images using paragraph nodes with legend in between the images and caption using JAVA
Source Document is splited into 4 parts
Expected Output is splitted into 5 parts
We have been using Copy/Extract shape using paragraph node in JAVA this concept to extract part images in whole but we have a problem when it has legends in between it.

Source_1.zip (1.8 MB)

Regards
Priyadharshini


#2

@priyadharshini,

Thanks for your inquiry. Please use the following code example to extract the desired contents from the document. Hope this helps you.

Document doc = new Document(MyDir + "Source_2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (
                PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                        PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
                        PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                        PreviousPara.toString(SaveFormat.TEXT).trim().contains("(c)") ||
                        PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)"))
                )
        {
            PreviousPara = PreviousPara.getPreviousSibling();
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

#3

Thank you @tahir.manzoor.
It is working fine.

Regards
Priyadharshini