How will a image be determined to select a top or bottom caption

MikeLak · March 19, 2018, 4:27am

Kindly help me to determine how can we make an image to select top caption or bottom caption. i.e.An Image is having 2 captions one at the top as a paragraph tag and another at the bottom as another paragraph tag. How can it determine which caption to select eg. Consider Fig 3 and Fig 4 in this sample attached … We need to determine as
Image 1 caption at top, Image 2 caption at bottom. ,. In this example we are having capttions a the bottom of the image there are cases where we have to deal captions at top of image mixed in one document. Thanks in advance.

MikeLak · March 19, 2018, 4:28am

Kindly find attached my code

public static void main(String[] args) throws Exception {
	try {
		com.aspose.words.License license = new com.aspose.words.License();
		
	} catch (Exception e) {
		System.out
				.println("License File Not Accessed\nError in Word Document");

		System.exit(0);
	}
	/** ASPOSE LICENSE END **/

	System.out.println("**********");
	String dataDir = args[0];
	System.out.println("**********");
	Document doc = new Document(dataDir);
	DocumentBuilder builder = new DocumentBuilder(doc);
	int i = 1;
	ArrayList nodes = new ArrayList();


	//Remove empty paragraphs
	for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
	    if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
	            && paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
	            && paragraph.getText().contains(ControlChar.PAGE_BREAK) == false) {
	        paragraph.remove();
	       
	    }
	}

	//Get the paragraphs that start with "Fig".
	for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
	{
	    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	    {
	        Node previousPara = paragraph.getNextSibling();
	        while (previousPara != null
	                && previousPara.getNodeType() == NodeType.PARAGRAPH
	                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0
	                && ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
	        {
	            if(previousPara != null)
	                nodes.add(previousPara);
	            previousPara = previousPara.getPreviousSibling();
	        }

	        if(nodes.size() > 0)
	        {
	            //Reverse the node collection.
	            Collections.reverse(nodes);

	            //Extract the consecutive shapes and export them into new document
	            Document dstDoc = new Document();
	            for (Paragraph para : (Iterable<Paragraph>)nodes)
	            {
	                NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
	                Node newNode = importer.importNode(para, true);
	                dstDoc.getFirstSection().getBody().appendChild(newNode);
	            }
	            //Remove the first empty paragraph
	            if(dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
	                dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
	            System.out.println(paragraph.toString(SaveFormat.TEXT));

// dstDoc.save(MyDir + “output”+i+".docx");
i++;
nodes.clear();
}
}
}

tahir.manzoor · March 19, 2018, 10:04am

@MikeLak,

Thanks for your inquiry. Please ZIP and attach your input Word document here for our reference. We will then provide you more information about your query.

MikeLak · March 19, 2018, 10:05am

Doc1.zip (2.8 MB)

tahir.manzoor · March 19, 2018, 2:58pm

@MikeLak,

Thanks for sharing the document. Please use the following code example to check if the Fig caption is under the image.

Document doc = new Document(MyDir + "Doc1.docx");
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node previousPara = paragraph.getPreviousSibling();
        if (previousPara != null
                && previousPara.getNodeType() == NodeType.PARAGRAPH
                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0
                && ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
        {
            System.out.println("Fig caption is at the bottom of image.");
            break;
        }
    }
}

MikeLak · March 20, 2018, 4:38am

Hi Tahir

This coding is wrongly printing the image caption at the top and image caption at the bottom of the page.Help please.

public class ImageCaptionTopOrBottom {

public static void main(String[] args) throws Exception {
	// TODO Auto-generated method stub
	try {
		com.aspose.words.License license = new com.aspose.words.License();
		
	} catch (Exception e) {
		System.out
				.println("License File Not Accessed\nError in Word Document");

		System.exit(0);
	}
	/** ASPOSE LICENSE END **/

	System.out.println("**********");
	String dataDir = args[0];
	System.out.println("**********");
	Document doc = new Document(dataDir);
	DocumentBuilder builder = new DocumentBuilder(doc);
	ArrayList nodes = new ArrayList();

int shapeCount = 0;
//Document doc = new Document(MyDir + “Doc1.docx”);
for (Paragraph paragraph : (Iterable) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith(“Fig”))
{
Node previousPara = paragraph.getPreviousSibling();
shapeCount++;
if (previousPara != null
&& previousPara.getNodeType() == NodeType.PARAGRAPH
&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
{

	            System.out.println("Fig caption is at the bottom of image."+shapeCount);
	            
	            
		           
		        }
	        else{
	            System.out.println("Fig caption is at the top of image."+shapeCount);
	            
	            
	        }
	        }
	        
	    }
	}



}

tahir.manzoor · March 20, 2018, 1:45pm

@MikeLak,

Thanks for your inquiry. In your code the else part does not work. You need to check if next sibling of Fig caption has image. You can use Node.NextSibling property to get the node immediately following this node and Node.PreviousSibling property to get the node immediately preceding this node.

In the else part of code, please use Node.NextSibling property to check if next node of fig caption has image or not. Please check the following solution.

Import the document into Aspose.Words’ DOM
Iterate through the paragraphs of document and check if paragraph’s text is started with “Fig” caption.
If true, please use Paragraph.PreviousSibling property to get the previous sibling node. If it is Paragraph node and contains the Shape node that has image, the Fig caption is at the bottom of image.
Similarly, please use the Paragraph.NextSibling property to get the next sibling and check the image. If next sibling has the shape node that contains the image, the fig caption is at top of the image.

Note that in step 3 and 4, the current node is paragraph that has the Fig caption.

MikeLak · March 21, 2018, 1:29pm

This is wrongly printing both top and bottom caption…

public class ImageCaptionTopOrBottom {

public static void main(String[] args) throws Exception {
	// TODO Auto-generated method stub
	try {
		com.aspose.words.License license = new com.aspose.words.License();
	
	} catch (Exception e) {
		System.out
				.println("License File Not Accessed\nError in Word Document");

		System.exit(0);
	}
	/** ASPOSE LICENSE END **/

	System.out.println("**********");
	String dataDir = args[0];
	System.out.println("**********");
	Document doc = new Document(dataDir);
	DocumentBuilder builder = new DocumentBuilder(doc);
	int shapeCount = 0;
	String caption = "";
	for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
	{
	if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
	{
	Node previousPara = paragraph.getPreviousSibling();
	Node nextPara = paragraph.getPreviousSibling();
	
	shapeCount++;
	while (previousPara != null
	&& previousPara.getNodeType() == NodeType.PARAGRAPH
	&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
	&& ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
	{
		

		            System.out.println("Fig caption is at the bottom of image."+shapeCount);
		            
		            
			           
			        }
		       while (nextPara != null
		    		&& nextPara.getNodeType() == NodeType.PARAGRAPH
		    		&& nextPara.toString(SaveFormat.TEXT).trim().length() == 0
		    		&& ((Paragraph)nextPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0){
		            System.out.println("Fig caption is at the top of image."+shapeCount);
		            
		            
		        }
		        }
		        
		    }
		}



	
	        
	    }

tahir.manzoor · March 21, 2018, 4:10pm

@MikeLak,

Thanks for your inquiry.

You are using paragraph.getPreviousSibling for next paragraph that is incorrect. Please do not use while loop in this case.

Please check the following code example for both cases. Hope this helps you.

Document doc = new Document(MyDir + "Doc1.docx");
for (Paragraph  paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        if(paragraph.toString(SaveFormat.TEXT).trim().equals("Figures"))
            continue;

        Node previousPara = paragraph.getPreviousSibling();
        Node nextPara = paragraph.getNextSibling();

        if (previousPara != null
                && previousPara.getNodeType() == NodeType.PARAGRAPH
                && previousPara.toString(SaveFormat.TEXT).trim().length() == 0
                && ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
        {
            System.out.println("Fig caption is at the bottom of image.");
            break;
        }

        if (nextPara != null
                && nextPara.getNodeType() == NodeType.PARAGRAPH
                && nextPara.toString(SaveFormat.TEXT).trim().length() == 0
                && ((Paragraph)nextPara).getChildNodes(NodeType.SHAPE, true).getCount() >= 0)
        {
            System.out.println("Fig caption is at the top of image.");
            break;
        }
    }
}