Bookmark the paragraph node

akshayapria · November 7, 2017, 12:10pm

Hi Team,

I am extracting the images using paragraph .

How to bookmark and delete the extracted images paragraph nodes. from the document.

The sample code
ArrayList nodes = new ArrayList();
Document interimdoc11 = new Document(interim);
// Remove empty paragraphs
for (Paragraph paragraph : (Iterable) interimdoc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
&& paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& paragraph.getText().contains(ControlChar.PAGE_BREAK) == false) {
paragraph.remove();
}
}

		// Get the paragraphs that start with "Fig".
		for (Paragraph paragraph : (Iterable<Paragraph>) interimdoc.getChildNodes(NodeType.PARAGRAPH, true)) {
			if (paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")
					|| paragraph.toString(SaveFormat.TEXT).trim().startsWith("Sch")) {
				Node previousPara = paragraph.getPreviousSibling();
				while (previousPara != null && previousPara.getNodeType() == NodeType.PARAGRAPH
						&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
						&& ((Paragraph) previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
					if (previousPara != null)
						nodes.add(previousPara);
					previousPara = previousPara.getPreviousSibling();
				}

				if (nodes.size() > 0) {
					// Reverse the node collection.
					Collections.reverse(nodes);

					// Extract the consecutive shapes and export them into
					// new document
					Document dstDoc = new Document();
					for (Paragraph para : (Iterable<Paragraph>) nodes) {
						NodeImporter importer = new NodeImporter(interimdoc, dstDoc,
								ImportFormatMode.KEEP_SOURCE_FORMATTING);
						Node newNode = importer.importNode(para, true);
						dstDoc.getFirstSection().getBody().appendChild(newNode);

						para.remove();
						interimdoc.save(interim);
					}
					// Remove the first empty paragraph
					if (dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim()
							.length() == 0)
						dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
					/** OUTPUT FILENAME START **/
					String Imgcaption = paragraph.toString(SaveFormat.TEXT);
					int k = 0;
					while (k < Imgcaption.length() && !Character.isDigit(Imgcaption.charAt(k)))
						k++;
					int j = k;
					while (j < Imgcaption.length() && Character.isDigit(Imgcaption.charAt(j)))
						j++;
					int l = Integer.parseInt(Imgcaption.substring(k, j));
					strI = Integer.toString(l);
					Pattern pattern = Pattern.compile(strI);
					Matcher matcher = pattern.matcher(Imgcaption);
					while (matcher.find()) {
						name = Imgcaption.substring(0, matcher.end());
						name = name.replace(".", "_");
					}
					if (name.startsWith("Fig")) {
						name = "Fig" + "_" + l;
					}
					/** OUTPUT FILENAME END **/
					filename = folder_name + "_" + "Fig_a" + i + "_" + name + ".docx";

					dstDoc.save(filename);

					// RemoveEmptyPages(filename);

					i++;

					nodes.clear();
				}
			}
		}
		/** SECTION A END **/

Thanks & Regards,
pria

tahir.manzoor · November 7, 2017, 3:50pm

@akshayapria,

Thanks for your inquiry. Please refer to the following articles:
Inserting a Bookmark
Moving the Cursor

In your case, we suggest you following solution.

Please move the cursor to the paragraph node (Fig caption) and insert StartBookmark node using DocumentBuilder.StartBookmark method.
Move the cursor to the end of paragraph that contains the shape node using DocumentBuilder.MoveToParagraph method and insert BookmarkEnd node using DocumentBuilder.EndBookmark method.

You can remove the content of bookmark by setting the value of Bookmark.Text property to empty string. Hope this helps you.

priyanga · November 8, 2017, 4:57am

Hi @tahir.manzoor,

Thanks for your feedback .

for (Paragraph paragraph : (Iterable) interimdoc.getChildNodes(NodeType.PARAGRAPH, true)) {
nodes = new ArrayList();
if (paragraph.toString(SaveFormat.TEXT).trim().startsWith(“Fig”))

			{
				 //nodes.add(paragraph);
				Node previousPara = paragraph.getPreviousSibling();
				while (previousPara != null && previousPara.getNodeType() == NodeType.PARAGRAPH
				// && previousPara.toString(SaveFormat.TEXT).trim().length()
				// == 0
						&& ((Paragraph) previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
					if (previousPara != null)
						nodes.add(previousPara);
					previousPara = previousPara.getPreviousSibling();
				}
				// Remove text only paragraph
				if (nodes.size() == 1
						&& ((Paragraph) nodes.get(0)).getChildNodes(NodeType.SHAPE, true).getCount() == 0)
					nodes.clear();
				if (nodes.size() > 0) {

					// Reverse the node collection.

					Collections.reverse(nodes);

					// Extract the consecutive shapes and export them into
					// new document
					Document dstDoc = new Document();
					dstDoc.removeAllChildren();
					dstDoc.ensureMinimum();

					for (Paragraph para : (Iterable<Paragraph>) nodes)

					{

						NodeImporter importer = new NodeImporter(interimdoc, dstDoc,
								ImportFormatMode.KEEP_SOURCE_FORMATTING);
						if (dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim()
								.length() == 0)
							dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
						/** OUTPUT FILENAME START **/
						String Imgcaption = paragraph.toString(SaveFormat.TEXT);
						int k = 0;
						while (k < Imgcaption.length() && !Character.isDigit(Imgcaption.charAt(k)))
							k++;
						int j = k;
						while (j < Imgcaption.length() && Character.isDigit(Imgcaption.charAt(j)))
							j++;
						int l = Integer.parseInt(Imgcaption.substring(k, j));
						strI = Integer.toString(l);
						Pattern pattern = Pattern.compile(strI);
						Matcher matcher = pattern.matcher(Imgcaption);
						while (matcher.find()) {
							name = Imgcaption.substring(0, matcher.end());
							name = name.replace(".", "_");
						}
						if (name.startsWith("Fig")) {
							name = "Fig" + "_" + l;
						}
						/** OUTPUT FILENAME END **/
						Node newNode = importer.importNode(para, true);
						dstDoc.getFirstSection().getBody().appendChild(newNode);
						// deletecaption( filename);
						filename = folder_name + "_" + "Fig_land" + i + "_" + name + ".docx";
						dstDoc.save(filename);
					}
					
					    if(paragraph.toString(SaveFormat.TEXT).trim().contains("Figure"))
					    {
					        DocumentBuilder builder;
							builder.moveTo(paragraph.getRuns().get(0));
					        builder.startBookmark("bookmark"+i);
					        builder.endBookmark("bookmark"+i);
					        bookmark.setText("");
					i++;
					nodes.clear();

				}

			}
		}

I have attached the bookmarked code.
please,how to empty the string and how to made it clear.please help me clear the issue.

Thanks and regards,
priyanga G

tahir.manzoor · November 8, 2017, 10:37am

@priyanga,

Thanks for your inquiry. Following code example shows how to bookmark the content (Fig caption and Shape node). After inserting the bookmarks, this code example removes the first bookmark. We have attached the DOM image for bookmarks. Hope this helps you.
DOM.png (18.5 KB)

If you still face problem, please share your input and expected output Word documents here for our reference. Please manually create your expected output Word document using Microsoft Word. We will investigate how you want your final Word output be generated like. We will then provide you more information on this along with code.

Document doc = new Document(MyDir + "Test2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
ArrayList nodes = new ArrayList();
int bookmark = 1;
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs) {
    nodes = new ArrayList();
    if (paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig")) {
        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.startBookmark("Bookmark" + bookmark);

        Node nextSibling = paragraph.getNextSibling();
        while (nextSibling != null && nextSibling.getNodeType() == NodeType.PARAGRAPH
                && ((Paragraph) nextSibling).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
            if (nextSibling != null)
                nodes.add(nextSibling);
            nextSibling = nextSibling.getNextSibling();
        }

        //nextSibling contains the caption of next shape
        //Move the cursor to the end of paragraph
        builder.moveToParagraph(paragraphs.indexOf((Paragraph)nextSibling.getPreviousSibling()), -1);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

//Remove the content of first bookmark.
doc.getRange().getBookmarks().get("bookmark1").setText("");

doc.save(MyDir + "output.docx");

akshayapria · November 8, 2017, 12:52pm

Hi @tahir.manzoor

Thanks for your solution.it is working fine.

I have face the following error in this line{ builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);}

Parameter name: paraIdx(ILLEGAL ARGUMENT ERROR)
at com.aspose.words.DocumentBuilder.zzZ(Unknown Source)
at com.aspose.words.DocumentBuilder.moveToParagraph(Unknown Source)

please help me.how to resolve it.
Thank you very much.
regards,
Priyanga G

tahir.manzoor · November 8, 2017, 1:27pm

@akshayapria,

Thanks for your inquiry. Please make sure that you are using correct Paragraph’s node collection. This exception is thrown when paragraph index is incorrect. Please check the index value that you are using in DocumentBuilder.moveToParagraph method.