Navigating node structure

ompdev · January 13, 2017, 9:54am

Hi

Couple of questions:

1. I am trying to store the text of all paragraphs within a document, I can obtain each Paragraph node and store the text value but if there is a comment near the paragraph, it stores the comments text as well. Is there a method or removing the comments text from the paragraph or just obtaining the text from the paragraph.

2. Related to question one, the character sequence that is before the comments text, is there a way to access these characters and manipulate a paragraph based on these tags?

3. Is there a way to distinguish between pictures and charts. Here is my current code:

switch(currentNode.getNodeType())

{

case NodeType.COMMENT:

fileBuilder.addDataBuilder().addComment(processComment(currentNode, pageNo));

break;

case NodeType.SHAPE:

Shape shape = (Shape)currentNode;

if(shape.hasChart())

{

fileBuilder.addImagesBuilder().addChart(processChart(shape.getImageData().toByteArray(), pageNo));

}

if(shape.hasImage())

{

fileBuilder.addImagesBuilder().addPicture(processPicture(shape.getImageData().toByteArray(), pageNo));

}

break;

4. Is there a way to identify Text Boxes

I have attached the sample file

Thanks

tahir.manzoor · January 16, 2017, 4:44am

Hi Greg,

Thanks for your inquiry.

ompdev:
1. it stores the comments text as well. Is there a method or removing the comments text from the paragraph or just obtaining the text from the paragraph.

In this case, we suggest you please create a document’s clone and remove the comments from it as shown below. Please refer to the following article:

Working with Comments

Document doc = new Document(MyDir + “AttributeSample.doc”);

<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”><pre style=“font-family: “Courier New”; font-size: 9pt;”>Document cloneDoc = doc.deepClone();
cloneDoc.getChildNodes(NodeType.COMMENT, true).clear();

ompdev:
2. Related to question one, the character sequence that is before the comments text, is there a way to access these characters and manipulate a paragraph based on these tags?

Could you please elaborate this query? We will then provide you more information on this along with code.

ompdev:
3. Is there a way to distinguish between pictures and charts. Here is my current code:

Your document contains the OLE object instead of chart. Please use Shape.OleFormat property to get access to the OLE data of a shape.

<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = new Document(MyDir + “AttributeSample.doc”);

for (Shape shape : (Iterable)doc.getChildNodes(NodeType.SHAPE, true))
{
if(shape.getOleFormat() != null)
System.out.println(shape.getOleFormat().getProgId());
}

ompdev:
4. Is there a way to identify Text Boxes

Please use Shape.AlternativeText property to identify the (text box) Shape node. You can also identify a text box (Shape node) by inserting a bookmark in it and get it as shown below.

<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Bookmark bm = doc.getRange().getBookmarks().get(“bookmarkname”);
Shape textbox = (Shape)bm.getBookmarkStart().getAncestor(NodeType.SHAPE);
if (textbox != null)
{
//Your code…
}

ompdev · January 16, 2017, 9:34am

Hi

Thanks for the reply, your suggestions have worked fine.

In regards to the text boxes, this is my current code:

nodes = doc.getChildNodes(NodeType.SHAPE, true);

for(Node currentNode : (Iterable)nodes)

{

pageNo = docLayout.getStartPageIndex(currentNode);

shape = (Shape) currentNode;

if (shape.getOleFormat() != null)

{ fileBuilder.addImagesBuilder().addChart(processChart(shape.getImageData().toByteArray(),pageNo));

}

else if (shape.hasImage())

{ fileBuilder.addImagesBuilder().addPicture(processPicture(shape.getImageData().toByteArray(),pageNo));

}

else if (shape.getAlternativeText().equals(""))

{ fileBuilder.addDataBuilder().addTextBox(processTextBox(currentNode,pageNo));

}

With the shape.alternative text property, is there a way to ensure only text boxes will meet the condition?

In regards to the second question, I’ve attached a snippet of the characters I was referring to and was wondering if there was a way to use these characters to detect a comment in the paragraph but the suggestion provided for question 1 works fine.

Thanks

tahir.manzoor · January 17, 2017, 4:11am

Hi Greg,

Thanks for your inquiry.

ompdev:

With the shape.alternative text property, is there a way to ensure only text boxes will meet the condition?

Shape class represents an object in the drawing layer, such as an AutoShape, textbox, freeform, OLE object, ActiveX control, or picture. The ShapeBase.AlternativeText property is not specific to text box. You can check either the shape node contains the text or not by using Shape.GetText method.

ompdev:

In regards to the second question, I've attached a snippet of the characters I was referring to and was wondering if there was a way to use these characters to detect a comment in the paragraph but the suggestion provided for question 1 works fine.

Comment is an inline-level node and can only be a child of Paragraph. Yes, you can find a text inside a document and check either the matched paragraph have comments or not. Please refer to the following article:

Find and Replace

Please share the text of "AttributeSample.doc" that you want to use for the detection of comments. We will then provide you code example according to your requirements.