Handle label images

akshayapria · November 10, 2017, 1:07pm

Hi Team,
The requirement is extracting the images and saved into new document.For the extraction process using paragraph node and fig caption as keyword. In my code i have separate the image handling in following ways
Section A-handling figures with caption as previous
Section B-handling images with caption as nextsibling
Section C-handling images inside the table
Section D-handling images landscape mode

The input document having images below label is there instead of fig caption.because in my cases handle fig caption fine but label section is not handled.

So please kindly help me to extract the labeled images along with label tag.(fig1 & fig 9)

The input Test.zip (2.7 MB)

The source code source.zip (8.3 KB)

The expected output expected output.zip (2.2 MB)

The actual output ActualOuput.zip (2.2 MB)

Thank you very much,
pria

tilal.ahmad · November 10, 2017, 3:33pm

@akshayapria

Thanks for your inquiry. Please note your code of your described sections does not cover the Figure 1 and Figure 9 scenario. Please check document structure(DOM) in document explorer, it will help you to work with Word documents and refine your code. The code stops collecting nodes when it finds a Paragraph node without Shape child node. In this scenario you need to bookmark the contents and extract these nodes as suggested in other post.

akshayapria · November 11, 2017, 4:24am

Hi @tilal.ahmad,

Thanks for your feedback,

I had tried that one it shows the index error and also I had discussed about that issue.
how can i check shape child node of paragraph node.please help me to resolve it.

Regards,
pria.

tilal.ahmad · November 11, 2017, 8:44am

@akshayapria

You are already checking the Shape child nodes count of a paragraph in your above shared code.

....
Node previousPara = paragraph.getPreviousSibling();
     while (previousPara != null && previousPara.getNodeType() == NodeType.PARAGRAPH
     	//&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
     	&& ((Paragraph) previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
     	if (previousPara != null)
        	nodes.add(previousPara);
        	previousPara = previousPara.getPreviousSibling();
        	}  
...

Please share your sample code here. We will look into it and will guide you accordingly.

akshayapria · November 11, 2017, 9:00am

Hi @tilal.ahmad,

Thank you very much.

The source code is source.zip (8.3 KB)
I had shared the input document,expected output,actual output
please help me to resolve the same .

Regards,
pria

tilal.ahmad · November 11, 2017, 9:17am

@akshayapria

I think there is some confusion, you have again shared your old code. Please share the code you are using to apply above shared bookmark approach.

akshayapria · November 11, 2017, 10:05am

Hi @tilai,

Please made the changes in that new extraction code.

please help me to solve the issue the showcase nearing .so please kindly help me

The sample code is Test.zip (41.0 KB)

Thanks in advance,
pria

tilal.ahmad · November 13, 2017, 2:43am

@akshayapria

Thanks for your feedback. Please check sample code snippet to bookmark the contents and extract them in your other post. You can customize it as per your document, it will help you to accomplish the task.

akshayapria · December 6, 2017, 11:51am

Hi @tilal.ahmad

Thanks for your feedback.

please,I have enclosed the code for labeled images.

but in some cases it is not working.please kindly help me to change the conditions .The source code is,

interimdoc.updateListLabels();
DSMT4(interim);
NodeCollection shapes1 = interimdoc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable)shapes1)
{
	if (shape.hasChart() || shape.hasImage())
	{
		Paragraph paragraph = shape.getParentParagraph();

		// Modify this condition according to your requirement
		if (paragraph.toString(com.aspose.words.SaveFormat.TEXT).contains("a)")
				|| paragraph.toString(com.aspose.words.SaveFormat.TEXT).contains("b)")
				|| paragraph.toString(com.aspose.words.SaveFormat.TEXT).contains("c)"))
		{
			com.aspose.words.Document dstDoc = new com.aspose.words.Document();

			NodeImporter importer = new NodeImporter(interimdoc, dstDoc,
					ImportFormatMode.KEEP_SOURCE_FORMATTING);
			Node newNode = importer.importNode(paragraph, true);
			dstDoc.getFirstSection().getBody().appendChild(newNode);
			String Imgcaption = (String)shape.getParentParagraph().getNextSibling()
					.toString(com.aspose.words.SaveFormat.TEXT);
			//

			filename = folder_name + "fig" + i + ".docx";
			((Paragraph)paragraph).getChildNodes(NodeType.SHAPE, true).clear();
			Paragraph p = ((Paragraph)paragraph);
			p.getChildNodes(NodeType.SHAPE, true).clear();
			p.appendChild(new BookmarkStart(interimdoc, "MyBookmark"));
			Run run1 = new Run(interimdoc, "<fig>" + "num_figure_" + i + "</fig>");
			run1.getFont().setColor(Color.RED);
			p.getRuns().add(run1);
			p.appendChild(new BookmarkEnd(interimdoc, "MyBookmark"));
			dstDoc.save(filename);
			i++;
		}

		Node node = shape.getParentParagraph().getNextSibling();
		// Modify this condition according to your requirement
		if (node != null && node.getNodeType() == NodeType.PARAGRAPH
				&& (((Paragraph)node).isListItem()
						|| node.toString(com.aspose.words.SaveFormat.TEXT).contains("(a)")
						|| node.toString(com.aspose.words.SaveFormat.TEXT).contains("(b)")
						|| node.toString(com.aspose.words.SaveFormat.TEXT).contains("(c)")))
		{

			com.aspose.words.Document dstDoc = new com.aspose.words.Document();

			NodeImporter importer = new NodeImporter(interimdoc, dstDoc,
					ImportFormatMode.KEEP_SOURCE_FORMATTING);
			Node newNode = importer.importNode(shape, true);
			dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
			// dstDoc.
			/** OUTPUT FILENAME START **/

			if (dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim()
					.startsWith("Figure"))
			{
				dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
			}
			filename = folder_name + "Fig_" + i + ".docx";
			converttopdf(filename);

			i++;
			//converttopdf(filename);
			//deletefiles(folder_name);
		}

	}
}

The input Test.zip (2.7 MB)

The expected output expected Output.zip (2.6 MB)

Thanks & regards,
pria.

tahir.manzoor · December 7, 2017, 10:07am

@akshayapria,

Thanks for your inquiry. We are working over this query and will get back to you soon.

tahir.manzoor · December 9, 2017, 2:32pm

@akshayapria,

Thanks for your patience. Please use the following code example to get the desired output. We have attached the output documents with this post for your kind reference. output documents.zip (2.2 MB)

Document doc = new Document(MyDir + "Test.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;

NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
    if((paragraph.toString(SaveFormat.TEXT).trim().contains("(a)") ||
        paragraph.toString(SaveFormat.TEXT).trim().contains("(b)") ||
        paragraph.toString(SaveFormat.TEXT).trim().contains("(c)"))
            && paragraph.getPreviousSibling() != null
            &&  paragraph.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH
            &&  ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
    {
        Document dstDoc = new Document();
        NodeCollection shapes = ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true);
        for (Shape shape : (Iterable<Shape>) shapes)
        {
            NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            Node newNode = importer.importNode(shape, true);
            dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
            dstDoc.save(MyDir + "output"+i+".docx");
            i++;
        }
    }
}

ArrayList nodes = new ArrayList();
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs){
    if (paragraph.toString(SaveFormat.TEXT).trim().startsWith("Figure")) {
        Node previousPara = paragraph.getPreviousSibling();
        if(previousPara.toString(SaveFormat.TEXT).trim().startsWith("") && ((Paragraph) previousPara).hasChildNodes() == false)
            previousPara = previousPara.getPreviousSibling();

        while (previousPara != null && previousPara.getNodeType() == NodeType.PARAGRAPH
                && ((Paragraph) previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
        {
            if (previousPara != null)
                nodes.add(previousPara);
            previousPara = previousPara.getPreviousSibling();

            while((previousPara.toString(SaveFormat.TEXT).trim().startsWith("")
                    && ((Paragraph) previousPara).hasChildNodes() == false)
                ||(previousPara.toString(SaveFormat.TEXT).trim().startsWith("a)")))
            {
                nodes.add(previousPara);
                previousPara = previousPara.getPreviousSibling();
            }

        }

        if (nodes.size() > 0) {
            // Reverse the node collection.
            Collections.reverse(nodes);

            // Extract the consecutive shapes and export them into new document
            Document dstDoc = new Document();
            for (Paragraph para : (Iterable<Paragraph>) nodes) {
                NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                Node newNode = importer.importNode(para, true);
                dstDoc.getFirstSection().getBody().appendChild(newNode);
            }
            dstDoc.save(MyDir + "output"+i+".docx");
            i++;
        }
        nodes.clear();
    }
}

akshayapria · December 11, 2017, 1:00pm

Hi team,

Thank you very much.

Its really useful for me.

Now , I need to bookmark the extracted images and also apply style to that images.

I have attached the source code for my extraction process.please help me to how to made changes in my code for bookmarking and apply stylename to that bookmarked content.

source code Kite.zip (16.9 KB)

The sample document Sample_Doc_input.zip (1.7 MB)

The expected output Sample_Doc.zip (33.7 KB)

Thanks & Regards,
Priyanga G

tahir.manzoor · December 11, 2017, 5:35pm

@akshayapria,

Thanks for your inquiry. In this case, you need to move the cursor to the desired location and insert the text e.g. <fig>num_figure_5</fig>. Please refer to the following articles.

Working with Bookmarks
Specifying Formatting
Moving the Cursor
Inserting a String of Text

akshayapria · December 12, 2017, 1:03pm

Hi @tahir.manzoor,

Thanks for your feedback.

I have bookmark the place.

please,help me to add style in that bookmarked place.

((Paragraph) paragraph).getChildNodes(NodeType.SHAPE, true).clear();
Paragraph p = ((Paragraph) paragraph);
p.getChildNodes(NodeType.SHAPE, true).clear();
p.appendChild(new BookmarkStart(interimdoc, "MyBookmark"));
Run run = new Run(interimdoc, "<Fig>Numbered_Figure</Fig>");
run.getFont().setColor(Color.RED);
p.getRuns().add(run);
p.appendChild(new BookmarkEnd(interimdoc, "MyBookmark"));

Thanks & regards,
pria.

tahir.manzoor · December 12, 2017, 3:55pm

@akshayapria,

You are setting the bookmark’s text color correctly. Could you please share some more detail about your requirement along with expected output document? We will then provide you more information on this.

akshayapria · December 13, 2017, 5:29am

Hi @tahir.manzoor,

Thanks for your feedback.

My requirement is to extract the images and saved in separate word document.

After the extraction the extracted images are removed from the document and also place a bookmark and style in that location(before the fig caption).

please help me to put a style in that place.

Thanks & Regards,
pria

tahir.manzoor · December 13, 2017, 6:42am

@akshayapria,

Thanks for your inquiry. You just need to move the cursor to the fig caption (Paragraph node) and insert the desired content. Please check the documentation links shared in my old post.
https://forum.aspose.com/t/handle-label-images/166429/14