Extracting Anchored Images

Saranya_Sekar · October 15, 2018, 8:05am

Hi Team,

I want help in java to extract Anchored images from this sample document Anchored_Images.zip (2.6 MB)

and the expected output is Anchored_Images_output.zip (2.6 MB)

Also I need interim document generated with bookmark placed in the location of where the image is extracted.I have attached sample output for interim document as well. Kindly please help.

tahir.manzoor · October 15, 2018, 3:31pm

@Saranya_Sekar

Thanks for your inquiry. We already shared similar code example to extract the images from the document in your other thread. Please use the same approach to extract the images. You just need to change the condition of if statement or while loop according to your requirement.

Saranya_Sekar · October 16, 2018, 4:07am

@tahir.manzoor
I used the code mentioned in the previous post. It is generating output incorrectly. The input I used is Anchored_Images.zip (2.6 MB)

It produces the output Anchored_Images_output.zip (562.6 KB)

public static void anchoredExtractImages(Document interimdoc) throws Exception{
Document doc =interimdoc;

	DocumentBuilder builder = new DocumentBuilder(doc);
	ArrayList tables = new ArrayList();
	int bookmark = 1;
	int i = 1;
	NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
	for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
	{
	    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	    {

	        Node PreviousPara = paragraph.getPreviousSibling();

	        while (PreviousPara != null && PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
	                        (
	                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
	                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
	                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
	                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
	                        )
	                )
	        {
	            PreviousPara = PreviousPara.getPreviousSibling();
	            if(((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
	                break;
	        }
	        
	        if(PreviousPara == null)
	        {
	            builder.moveToDocumentStart();
	            builder.insertParagraph();
	            builder.startBookmark("Bookmark" + bookmark);
	            builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
	            builder.endBookmark("Bookmark" + bookmark);
	            bookmark++;
	        }
	        else if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
	        {
	            Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
	            builder.moveTo(node);
	            builder.startBookmark("Bookmark" + bookmark);
	            builder.moveTo(paragraph);
	            builder.endBookmark("Bookmark" + bookmark);
	            bookmark++;
	        }
	        else if(PreviousPara.getNodeType() == NodeType.TABLE)
	        {
	            if(((Table)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
	                tables.add(((Table)PreviousPara));
	        }

	    }
	}

	for (Bookmark bm : doc.getRange().getBookmarks())
	{
	    if(bm.getName().startsWith("Bookmark"))
	    {
	    	 ArrayList nodes = ExtractContentBetweenParagraphs((Paragraph)bm.getBookmarkStart().getParentNode(), (Paragraph) bm.getBookmarkEnd().getParentNode());
	            Document dstDoc = generateDocument(doc, nodes);

	        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
	        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
	        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
	        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

	        if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	            dstDoc.getLastSection().getBody().getLastParagraph().remove();

	        if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
	            dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

	        dstDoc.save(folderName + "output"+i+".docx");
	        i++;
	    }
	}
	
}

tahir.manzoor · October 16, 2018, 1:54pm

@Saranya_Sekar

Thanks for your inquiry. The same code will not work all uses cases. Please list down all uses cases of your documents and write the code accordingly. As shared in my previous post, you need to change the condition of if statement or while loop according to your requirement. In your use cases, the extraction of images is almost same. Please check the shared code example. It iterates over paragraphs, bookmark the images, and extract the content.

To insert the BookmarkEnd, you need to move the cursor to the paragraphs (Fig caption) and to insert the BookmarkStart, you need to move the cursor to the previous sibling node of desired Shape (image).

If you find that Aspose.Words’ APIs do not work properly, please let us know.

Saranya_Sekar · October 17, 2018, 4:40am

I changed the startbookmark and the endbookmark between the nodes but the expected output is not generated.

public  static void anchoredExtractImages(Document interimdoc) throws Exception{
	Document doc =interimdoc;

	DocumentBuilder builder = new DocumentBuilder(doc);
	ArrayList tables = new ArrayList();
	int bookmark = 1;
	int i = 1;
	NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
	try{
	for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
	{
	    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	    {

	        Node PreviousPara = paragraph.getPreviousSibling();

	        while (PreviousPara != null && PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 
	                )
	        {
	            PreviousPara = PreviousPara.getPreviousSibling();
	            if(((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
	                break;
	        }
	        
	        if(PreviousPara == null)
	        {
	            builder.moveToDocumentStart();
	            builder.insertParagraph();
	            builder.endBookmark("Bookmark" + bookmark);
	            builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
	            builder.startBookmark("Bookmark" + bookmark);
	            bookmark++;
	        }
	        else if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
	        {
	            Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
	            builder.moveTo(node);
	            builder.endBookmark("Bookmark" + bookmark);
	            builder.moveTo(paragraph);
	            builder.startBookmark("Bookmark" + bookmark);
	            bookmark++;
	        }
	        else if(PreviousPara.getNodeType() == NodeType.TABLE)
	        {
	            if(((Table)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
	                tables.add(((Table)PreviousPara));
	        }

	    }
	}
	}
	catch(Exception e)
	{
		
	}

	for (Bookmark bm : doc.getRange().getBookmarks())
	{
	    if(bm.getName().startsWith("Bookmark"))
	    {
	    	 ArrayList nodes = ExtractContentBetweenParagraphs((Paragraph)bm.getBookmarkStart().getParentNode(), (Paragraph) bm.getBookmarkEnd().getParentNode());
	            Document dstDoc = generateDocument(doc, nodes);

	        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
	        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
	        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
	        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

	        if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	            dstDoc.getLastSection().getBody().getLastParagraph().remove();

	        if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
	            dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

	        dstDoc.save(folderName + "output"+i+".docx");
	        i++;
	    }
	}
	
}

tahir.manzoor · October 17, 2018, 2:02pm

@Saranya_Sekar

Thanks for your inquiry. This code example does not work in this new use case. We suggest you please check all code examples shared by us for your documents. This will help you to understand about extraction of content.

We will write the code example for this case and share it here for your kind reference.

tahir.manzoor · October 18, 2018, 6:13am

@Saranya_Sekar

Please check the following while condition. You need to work on it in your use cases. The remaining code is almost same for your cases.

while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
        && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
           ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
)

Please use the following code example to get the desire doutput. We have attached the output documents with this post for your kind reference. Docs.zip (2.6 MB)

Document doc = new Document(MyDir + "Anchored_Images.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
UseCase1(doc, builder);
Extract_Images(doc, "uc1");

//Use case 1
public static void UseCase1(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                    && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                       ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
            }

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("BookmarkUC1" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("BookmarkUC1" + bookmark);
                bookmark++;
            }
        }
    }
}

public  static void Extract_Images(Document doc, String uc) throws Exception
{
    int i = 1;
    for (Bookmark bm : doc.getRange().getBookmarks())
    {
        if(bm.getName().startsWith("Bookmark"))
        {
            ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
            Document dstDoc = ExtractContents.generateDocument(doc, nodes);

            PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
            dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
            dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
            dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

            dstDoc.updatePageLayout();
            if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
                dstDoc.getLastSection().getBody().getLastParagraph().remove();

            dstDoc.updatePageLayout();
            while(dstDoc.getFirstSection().getBody().getFirstParagraph()!= null && dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.updatePageLayout();
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                dstDoc.save(MyDir + "out_"+i+".docx");
                i++;
            }

        }
    }
}

Saranya_Sekar · October 22, 2018, 12:11pm

@tahir.manzoor

I am not able to extract the interim document. I am using this code sample for extracting anchored image interim document and the document is Anchored_Images_Interim.zip (274.0 KB)

public static void anchored_Extract_Images(Document doc, String uc) throws Exception
{
int i = 1;
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith(“Bookmark”))
{
ArrayList nodes = ExtractContentBetweenParagraphs((Paragraph)bm.getBookmarkStart().getParentNode(), (Paragraph) bm.getBookmarkEnd().getParentNode());
Document dstDoc = generateDocument(doc, nodes);

            PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
            dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
            dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
            dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

            dstDoc.updatePageLayout();
            if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
                dstDoc.getLastSection().getBody().getLastParagraph().remove();

            dstDoc.updatePageLayout();
            while(dstDoc.getFirstSection().getBody().getFirstParagraph()!= null && dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.updatePageLayout();
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                dstDoc.save(folderName + "Anchored_"+i+".docx");
                i++;
            }

        }
    }
    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("BookmarkUC1")) {
            String figText = bm.getBookmarkEnd().getParentNode().toString(SaveFormat.TEXT);
            if(figText.trim().length() > 0)
                bm.setText("<Anchored-Fig>"+figText.trim().substring(0, 7)+"</Anchored-Fig>" + ControlChar.PARAGRAPH_BREAK);
        }
    }
}

public static void UseCaseAnchored(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                    && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                       ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
            }

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("BookmarkUC1" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("BookmarkUC1" + bookmark);
                bookmark++;
            }
        }
    }
}

tahir.manzoor · October 22, 2018, 5:06pm

@Saranya_Sekar

Thanks for your inquiry. Please check the code example shared in your other thread for interim document. You need to use the same approach to generate your interim document. We suggest you please read the complete code and write the code for your other use cases.

Saranya_Sekar · October 23, 2018, 6:09am

@tahir.manzoor

I am not able to extract this image completely Anchored_Fig.zip (19.5 KB)
and the derived output is Anchored_Fig (2).zip (16.3 KB)
I also have trouble in creating the interim document.
The code I used is

anchored_Extract_Images(doc, “uc1”);
UseCaseAnchored(doc,builder);

anchored_Extract_Images(doc,“uc1”);
UseCaseAnchoredInterim(doc,builder);

public  static void anchored_Extract_Images(Document doc, String uc) throws Exception
{
    int i = 1;
    for (Bookmark bm : doc.getRange().getBookmarks())
    {
        if(bm.getName().startsWith("Bookmark"))
        {
        	ArrayList nodes = ExtractContentBetweenParagraphs((Paragraph)bm.getBookmarkStart().getParentNode(), (Paragraph) bm.getBookmarkEnd().getParentNode());
            Document dstDoc = generateDocument(doc, nodes);

            PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
            dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
            dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
            dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

            dstDoc.updatePageLayout();
            if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
                dstDoc.getLastSection().getBody().getLastParagraph().remove();

            dstDoc.updatePageLayout();
            while(dstDoc.getFirstSection().getBody().getFirstParagraph()!= null && dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.updatePageLayout();
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                dstDoc.save(folderName + "Anchored_"+i+".docx");
                i++;
            }

        }
    }
    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("Bookmark")) {
            String figText = bm.getBookmarkEnd().getParentNode().toString(SaveFormat.TEXT);
            if(figText.trim().length() > 0)
                bm.setText("<Anchored-Fig>"+figText.trim().substring(0, 7)+"</Anchored-Fig>" + ControlChar.PARAGRAPH_BREAK);
        }
    }
}

public static void UseCaseAnchored(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                    && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                       ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
            }

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("BookmarkUC1" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("BookmarkUC1" + bookmark);
                bookmark++;
            }
        }
    }
}


public static void UseCaseAnchoredInterim(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Boolean bln = false;
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null &&
                    (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                        (
                                PreviousPara.toString(SaveFormat.TEXT).trim().startsWith("(Fig"))
                        )
                    )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
                bln = true;
            }

            if(!bln)
                continue;

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("Bookmark" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
        }
    }
}

tahir.manzoor · October 23, 2018, 3:34pm

@Saranya_Sekar

Thanks for your inquiry. Please do not use ExtractContentBetweenParagraphs method in your code. Please use the ExtractContents.extractContent method. This is the working code example for extracting the content. Please get the code from following article.
Extract Selected Content Between Nodes

Saranya_Sekar · October 24, 2018, 6:27am

@tahir.manzoor
The figures other than anchored are extracted. The input anchored image is
Article reviewed [13-05-2017]_test.zip (3.1 MB)
and the output derived is
Article reviewed [13-05-2017]_test_Interim.zip (22.2 KB)

and the interim document is Anchored_Fig_Interim.zip (7.2 KB) Kindly help please.

The code I am using is

public static void anchored_extract_Images(Document doc, String uc) throws Exception
{
int i = 1;
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith(“Bookmark”))
{
ArrayList nodes = extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = generateDocument(doc, nodes);

            PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
            dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
            dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
            dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

            dstDoc.updatePageLayout();
            if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
                dstDoc.getLastSection().getBody().getLastParagraph().remove();

            dstDoc.updatePageLayout();
            while(dstDoc.getFirstSection().getBody().getFirstParagraph()!= null && dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.updatePageLayout();
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                dstDoc.save(folderName + "anchored_image_"+i+".docx");
                i++;
            }

        }
    }
    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("Bookmark")) {
            String figText = bm.getBookmarkEnd().getParentNode().toString(SaveFormat.TEXT);
            if(figText.trim().length() > 0)
                bm.setText("<Anchored-Fig>"+figText.trim().substring(0, 7)+"</Anchored-Fig>" + ControlChar.PARAGRAPH_BREAK);
        }
    }
}

public static void UseCaseAnchoreInterim(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Boolean bln = false;
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null &&
                    (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                        (
                                PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
                                PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                                PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                                PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)") ||
                                PreviousPara.toString(SaveFormat.TEXT).trim().startsWith("(Fig"))
                        )
                    )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
                bln = true;
            }

            if(!bln)
                continue;

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("Bookmark" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
        }
    }
}


public static void UseCaseAnchored(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                    && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                       ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
            }

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("BookmarkUC1" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("BookmarkUC1" + bookmark);
                bookmark++;
            }
        }
    }
}

public static ArrayList extractContent(Node startNode, Node endNode, boolean isInclusive) throws Exception {
verifyParameterNodes(startNode, endNode);
ArrayList nodes = new ArrayList();

    Node originalStartNode = startNode;
    Node originalEndNode = endNode;

    while (startNode.getParentNode().getNodeType() != NodeType.BODY)
        startNode = startNode.getParentNode();

    while (endNode.getParentNode().getNodeType() != NodeType.BODY)
        endNode = endNode.getParentNode();

    boolean isExtracting = true;
    boolean isStartingNode = true;
    boolean isEndingNode;
    Node currNode = startNode;

    while (isExtracting) {
        CompositeNode cloneNode = (CompositeNode) currNode.deepClone(true);
        isEndingNode = currNode.equals(endNode);

        if (isStartingNode || isEndingNode) {
            if (isStartingNode) {
                processMarker(cloneNode, nodes, originalStartNode, isInclusive, isStartingNode, isEndingNode);
                isStartingNode = false;
            }

            if (isEndingNode) {
                processMarker(cloneNode, nodes, originalEndNode, isInclusive, isStartingNode, isEndingNode);
                isExtracting = false;
            }
        } else
            nodes.add(cloneNode);

        if (currNode.getNextSibling() == null && isExtracting) {
            Section nextSection = (Section) currNode.getAncestor(NodeType.SECTION).getNextSibling();
            currNode = nextSection.getBody().getFirstChild();
        } else {
            currNode = currNode.getNextSibling();
        }
    }

    return nodes;
}


private static void verifyParameterNodes(Node startNode, Node endNode) throws Exception {
    if (startNode == null)
        throw new IllegalArgumentException("Start node cannot be null");
    if (endNode == null)
        throw new IllegalArgumentException("End node cannot be null");

    if (!startNode.getDocument().equals(endNode.getDocument()))
        throw new IllegalArgumentException("Start node and end node must belong to the same document");

    if (startNode.getAncestor(NodeType.BODY) == null || endNode.getAncestor(NodeType.BODY) == null)
        throw new IllegalArgumentException("Start node and end node must be a child or descendant of a body");

    Section startSection = (Section) startNode.getAncestor(NodeType.SECTION);
    Section endSection = (Section) endNode.getAncestor(NodeType.SECTION);

    int startIndex = startSection.getParentNode().indexOf(startSection);
    int endIndex = endSection.getParentNode().indexOf(endSection);

    if (startIndex == endIndex) {
        if (startSection.getBody().indexOf(startNode) > endSection.getBody().indexOf(endNode))
            throw new IllegalArgumentException("The end node must be after the start node in the body");
    } else if (startIndex > endIndex)
        throw new IllegalArgumentException("The section of end node must be after the section start node");
}

private static boolean isInline(Node node) throws Exception {
    return ((node.getAncestor(NodeType.PARAGRAPH) != null || node.getAncestor(NodeType.TABLE) != null) && !(node.getNodeType() == NodeType.PARAGRAPH || node.getNodeType() == NodeType.TABLE));
}

private static void processMarker(CompositeNode cloneNode, ArrayList nodes, Node node, boolean isInclusive, boolean isStartMarker, boolean isEndMarker) throws Exception {
    if (!isInline(node)) {
        if (!(isStartMarker && isEndMarker)) {
            if (isInclusive)
                nodes.add(cloneNode);
        }
        return;
    }

    if (node.getNodeType() == NodeType.FIELD_START) {
        if ((isStartMarker && !isInclusive) || (!isStartMarker && isInclusive)) {
            while (node.getNextSibling() != null && node.getNodeType() != NodeType.FIELD_END)
                node = node.getNextSibling();

        }
    }

    if (node.getNodeType() == NodeType.COMMENT_RANGE_END) {
        while (node.getNextSibling() != null && node.getNodeType() != NodeType.COMMENT)
            node = node.getNextSibling();

    }

    int indexDiff = node.getParentNode().getChildNodes().getCount() - cloneNode.getChildNodes().getCount();

    if (indexDiff == 0)
        node = cloneNode.getChildNodes().get(node.getParentNode().indexOf(node));
    else
        node = cloneNode.getChildNodes().get(node.getParentNode().indexOf(node) - indexDiff);

    boolean isSkip;
    boolean isProcessing = true;
    boolean isRemoving = isStartMarker;
    Node nextNode = cloneNode.getFirstChild();

    while (isProcessing && nextNode != null) {
        Node currentNode = nextNode;
        isSkip = false;

        if (currentNode.equals(node)) {
            if (isStartMarker) {
                isProcessing = false;
                if (isInclusive)
                    isRemoving = false;
            } else {
                isRemoving = true;
                if (isInclusive)
                    isSkip = true;
            }
        }

        nextNode = nextNode.getNextSibling();
        if (isRemoving && !isSkip)
            currentNode.remove();
    }

    // After processing the composite node may become empty. If it has don't include it.
    if (!(isStartMarker && isEndMarker)) {
        if (cloneNode.hasChildNodes())
            nodes.add(cloneNode);
    }
}

tahir.manzoor · October 24, 2018, 4:03pm

@Saranya_Sekar

Thanks for your inquiry. We have tested the scenario using latest version of Aspose.Words for Java 18.10 and have not found the shared issue. Please use Aspose.Words for Java 18.10. We have attached the output documents and interim document with this post for your kind reference. Docs.zip (3.0 MB)

Saranya_Sekar · October 25, 2018, 3:47am

@tahir.manzoor
Can you please share the code you used for testing the interim document and output generation.Images Fig 10 and Fig 11 other than anchored images are also extracted in your output.

tahir.manzoor · October 25, 2018, 1:29pm

@Saranya_Sekar

Thanks for your inquiry. We used the following code example with the latest version of Aspose.Words for Java 18.10. The output documents are attached with my previous post.

Document doc = new Document(MyDir + "Article reviewed [13-05-2017]_test.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
UseCase5(doc, builder);
ExtractImages5(doc, "uc5", builder);
doc.save(MyDir + "out.docx");

public  static void ExtractImages5(Document doc, String uc, DocumentBuilder builder) throws Exception
{
    int i = 1;
    String bookmark = "bm_extract";
    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("Bookmark")) {
            bm.getBookmarkEnd().getParentNode().insertBefore(new BookmarkEnd(doc, bm.getName()), bm.getBookmarkEnd().getParentNode().getFirstChild());
        }
    }
    doc.updatePageLayout();
    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("Bookmark")) {
            Node currentNode = bm.getBookmarkStart();
            while (currentNode.getNodeType() != NodeType.SHAPE && currentNode.getNodeType() != NodeType.GROUP_SHAPE)
                currentNode = currentNode.nextPreOrder(doc);

            builder.moveTo(currentNode);
            builder.startBookmark(bookmark + i);
            builder.moveTo(bm.getBookmarkEnd());
            builder.endBookmark(bookmark + i);
            i++;
        }
    }

    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("Bookmark")) {
            bm.remove();
        }
    }
    doc.updatePageLayout();
    for (Bookmark bm : doc.getRange().getBookmarks())
    {
        if(bm.getName().startsWith("bm_extract"))
        {
            ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
            Document dstDoc = ExtractContents.generateDocument(doc, nodes);

            PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
            dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
            dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
            dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());

            dstDoc.updatePageLayout();
            if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
                dstDoc.getLastSection().getBody().getLastParagraph().remove();

            dstDoc.updatePageLayout();
            while(dstDoc.getFirstSection().getBody().getFirstParagraph()!= null && dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() == 0)
                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();

            dstDoc.updatePageLayout();
            if(dstDoc.getFirstSection().getBody().getFirstParagraph().getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            {
                String filename = bm.getBookmarkEnd().getParentNode().toString(SaveFormat.TEXT);
                if(filename.trim().length() > 0)
                    dstDoc.save(MyDir + filename.substring(0, 7) + "_out.docx");
                i++;
            }

        }
    }

    for (Bookmark bm : doc.getRange().getBookmarks()) {
        if (bm.getName().startsWith("bm_extract")) {
            String figText = bm.getBookmarkEnd().getParentNode().toString(SaveFormat.TEXT);
            if(figText.trim().length() > 0)
                bm.setText("<Fig>"+figText.trim().substring(0, 7)+"</Fig>" + ControlChar.PARAGRAPH_BREAK);
        }
    }
}

public static void UseCase5(Document doc, DocumentBuilder builder) throws Exception
{
    int bookmark = 1;
    int i = 1;
    NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
    for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
        {
            System.out.println(paragraph.getText());
            Boolean bln = false;
            Node PreviousPara = paragraph.getPreviousSibling();
            while (PreviousPara != null && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                    && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                       ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
            )
            {
                PreviousPara = PreviousPara.getPreviousSibling();
            }

            if(PreviousPara == null)
            {
                builder.moveToDocumentStart();
                builder.insertParagraph();
                builder.startBookmark("Bookmark" + bookmark);
                //builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
                builder.moveTo(paragraph);
                builder.endBookmark("Bookmark" + bookmark);
                bookmark++;
            }
            else
            if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
            {
                Node node = ((Paragraph)PreviousPara).getParentNode().insertBefore(new Paragraph(doc), PreviousPara);
                builder.moveTo(node);
                builder.startBookmark("BookmarkUC1" + bookmark);
                builder.moveTo(paragraph);
                builder.endBookmark("BookmarkUC1" + bookmark);
                bookmark++;
            }
        }
    }
}

Saranya_Sekar · October 26, 2018, 12:00pm

@tahir.manzoor

The images other than anchored images are extracted and in the interim they are bookmarked as label images…How to overcome this issue.
Input document : ManuscriptRevisedClean123.zip (2.6 MB)
Output generated is:ManuscriptRevisedClean123-shape.zip (2.8 MB)
and the interim is: ManuscriptRevisedClean123_Interim.zip (22.8 KB)

Kindly help to extract only anchored images and to bookmark it as anchored image.

The code I use is the one you shared for label images extraction and anchored images extraction.

tahir.manzoor · October 26, 2018, 4:54pm

@Saranya_Sekar

Thanks for your inquiry. Could you please share some detail about anchored images? Are you using latest version of Aspose.Words for Java 18.10?

Please share the screenshots of problematic sections of output documents. We will investigate the issue and provide you more information on it.

Saranya_Sekar · October 27, 2018, 6:50am

@tahir.manzoor

I consider the images which have anchored symbol as anchored images like Anchored_Image.png (173.2 KB) Images which doesn’t have anchored symbol like Not-Anchored-Image.png (222.5 KB)
The interim has as bookmark in place of Anchored_Fig_Interim.png (199.0 KB)
and the expected output is Expected_Anchored_Interim.png (186.5 KB)
Kindly help please.

tahir.manzoor · October 27, 2018, 1:02pm

@Saranya_Sekar

Thanks for sharing the detail. As per my understanding from your shared screenshot, you want to check either image is anchored or not and add the FIG tag accordingly.

You can use the same code to achieve your requirement. You only need check image (Shape node) wrap type using Shape.WrapType property while setting bookmark text e.g. <Fig>some text</Fig>. If it is “Inline”, the shape is not anchored.

Saranya_Sekar · October 29, 2018, 4:59am

@tahir.manzoor

Thanks for the response.