Find out if text is in a text box?

I am doing a text find operation using IReplacingCallback, and I want to find out if the text I’ve found is within a text box. I’m trying this (java):

public int replacing(ReplacingArgs e) throws Exception {
DocumentBuilder builder = new DocumentBuilder((Document)e.getMatchNode().getDocument()); 
builder.moveTo(e.getMatchNode());
// … can I find out here if the matching node happens to be within a text box? 
// can I find out the dimensions of the text box?

I am also interested in finding out whether the node is within a table cell. I think I’m doing this right, finding out if there’s an ancestor of type CELL:

Node anc = builder.getCurrentParagraph().getAncestor(NodeType.CELL);

I am having trouble getting the rendered height of the cell, though… if the row height is not explicitly set in the table properties, I can’t figure out how to get it. I think I’m getting the proper cell width just by getting the ancestor cell and doing this:

cell.getCellFormat().getWidth()

Any tips would be much appreciated!

Hi Brian,

Thanks for your inquiry.

*brian-3:

I
am doing a text find operation using IReplacingCallback, and I want to
find out if the text I’ve found is within a text box. I’m trying this
(java):

public int replacing(ReplacingArgs e) throws Exception {
DocumentBuilder builder = new DocumentBuilder((Document)e.getMatchNode().getDocument());
builder.moveTo(e.getMatchNode());
//… can I find out here if the matching node happens to be within a text box?
// can I find out the dimensions of the text box?*

You
can use Node.GetAncestor method with paramerter NodeType.SHAPE to
achieve your requirements. If you still face problem, please share your
input document here for testing. I will investigate the issue and
provide you more information about your query.

*brian-3:

I
am also interested in finding out whether the node is within a table
cell. I think I’m doing this right, finding out if there’s an ancestor
of type CELL:

Node anc = builder.getCurrentParagraph().getAncestor(NodeType.CELL);*

Yes, you are using the correct method to get the ancestor of specified NodeType.

*brian-3:

I
am having trouble getting the rendered height of the cell, though… if
the row height is not explicitly set in the table properties, I can’t
figure out how to get it. I think I’m getting the proper cell width
just by getting the ancestor cell and doing this:

cell.getCellFormat().getWidth()*

The
height of a table row is controlled using height and height rule
properties. These can be set differently for each row in the table which
allows for wide control over the height of each row. Please read
following documentation link for your kind reference.
https://docs.aspose.com/words/java/working-with-tables/

Hope this answers your queries. Please let us know if you have any more queries.

Thanks very much. I have a couple of questions:

  1. I still can’t get the row height. See below for what I’ve tried.
  2. There seems to be a discrepancy between the node types documented and the actual constants. Specifically, the type for SHAPE is not expected. See below for what I mean.

Here is my java code for the replacing method. I am looking for the text @@POL_NUM@@ and trying to see if it is within a text box or a table cell.

@Override
public int replacing(ReplacingArgs e) throws Exception {
System.out.println("---------------------------");
Node matchNode = e.getMatchNode();
DocumentBuilder builder = new DocumentBuilder((Document)matchNode.getDocument()); 
builder.moveTo(matchNode);
System.out.println("Match node is type " + matchNode.getNodeType());
// is the tag in a cell?

Node ancestor = matchNode.getAncestor(NodeType.CELL);
if (ancestor == null) {
System.out.println("The tag is NOT in a cell…");
Node n = matchNode;
while (n != null) {
System.out.println(" - Node type hierarchy: " + n.getNodeType());
CompositeNode cn = n.getParentNode();
if (cn != null && cn.getCount() > 1) {
System.out.println(" composite of this many: " + cn.getCount());
NodeCollection nc = cn.getChildNodes();
for (int i = 0;i<nc.getCount();i++) {
System.out.println(" type: " + nc.get(i).getNodeType());
}
}
n = cn;
}
} else {
System.out.println("The node is inside of a cell!");
Cell c = (Cell) ancestor;
System.out.println("Cell is " + c.getCellFormat().getWidth() + " wide");
Row r = (Row) c.getAncestor(NodeType.ROW);
System.out.println(" …and " + r.getRowFormat().getHeight() + 
" high with height rule: " + r.getRowFormat().getHeightRule()); 
}

// is the tag in a text box?
ancestor = matchNode.getAncestor(NodeType.SHAPE);
if (ancestor == null) {
System.out.println("The node is NOT in a SHAPE (i.e. not in a text box)");
} else {
System.out.println("The node is inside of a shape (type " + 
ancestor.getNodeType() + ")!");
Shape s = (Shape) ancestor;
System.out.println("The shape is " + s.getWidth() + " wide by " + 
s.getHeight() + " high");
}
return ReplaceAction.SKIP;
}

Here is the output when run against the attached test document:

---------------------------

Match node is type 21
The tag is NOT in a cell…
\- Node type hierarchy: 21
\- Node type hierarchy: 8
\- Node type hierarchy: 18
composite of this many: 3
type: 9
type: 10
type: 18
\- Node type hierarchy: 8
composite of this many: 33
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 5
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 8
type: 5
type: 8
\- Node type hierarchy: 3
composite of this many: 2
type: 4
type: 3
\- Node type hierarchy: 2
\- Node type hierarchy: 1
The node is inside of a shape (type 18)!
The shape is 186.95 wide by 110.55 high
\---------------------------
Match node is type 21
The node is inside of a cell!
Cell is 183.6 wide
…and 0.0 high with height rule: 2
The node is NOT in a SHAPE (i.e. not in a text box)
\---------------------------
Match node is type 21
The node is inside of a cell!
Cell is 111.7 wide
…and 0.0 high with height rule: 2
The node is NOT in a SHAPE (i.e. not in a text box)
\---------------------------
Match node is type 21
The node is inside of a cell!
Cell is 220.5 wide
…and 57.1 high with height rule: 0
The node is NOT in a SHAPE (i.e. not in a text box)

Note that I cannot get the height of a row when the height rule is Auto. Everything else seems to work fine (but I’d love to hear suggestions on doing anything better). Also, the NodeType for a SHAPE gets reported as 18. This is not what is documented at https://reference.aspose.com/words/java/com.aspose.words/NodeType
Why the discrepancy?

Thanks very much for any assistance.

Just as an update, I’ve tried using the LayoutEnumerator to get the row height, but that doesn’t work either. I do this:

Cell c = (Cell) ancestor;
Row r = (Row) c.getAncestor(NodeType.ROW);
layoutEnumerator.setCurrent(layoutCollector.getEntity®);
Rectangle2D.Float rect = layoutEnumerator.getRectangle();
System.out.println("Using enumerator, row (not cell) is at (" + rect.x +  ", " + rect.y + ") size " + rect.height + "h by " + rect.width + "w");

What I see is that regardless of the actual row height, the rectangle indicates a position at the right side of the table, with a height of 13.8 and a width of 6.0. I don’t know if this is working as intended or not. In addition, if I try to do “layoutEnumerator.SetCurrent(layoutCollector.getEntity©)”, in other words trying to find out where the actual table cell is, I get an illegal argument exception. This doesn’t seem to be documented anywhere, i.e. what types of entities will work with setCurrent.

Still stuck trying to figure out the row height.

Hi Brian,

Thanks for your inquiry.

*brian-3:

  1. There seems to
    be a discrepancy between the node types documented and the actual
    constants. Specifically, the type for SHAPE is not expected.*

Perhaps, you are using an older version of Aspose.Words; as with Aspose.Words v14.10.0, I am unable to reproduce this problem on my side. I would suggest you please upgrade to the latest version of Aspose.Words i.e. v14.10.0 and let us know how it goes on your side. Please check the attached Out.txt for output of your code. I hope, this will help.

*brian-3:

  1. I still can’t get the row height. See below for what I’ve tried.*

The LayoutCollector.getEntity Returns an opaque position of the LayoutEnumerator which corresponds to the specified node. You can use returned value as an argument to Current given the document being enumerated and the document of the node are the same.

This method works for only Paragraph nodes, as well as indivisible inline nodes, e.g. BookmarkStart or Shape. It doesn’t work for Run, CellRow or Table nodes, and nodes within header/footer.

Please use the following code example to achieve your requirements. This code example does the followings:

  1. Get the bounding rectangle of Paragraph inside first cell node of a row relative to the page top left corner (in points).
  2. Get the bounding rectangle of Paragraph inside first cell node of a Next row relative to the page top left corner (in points).
  3. Get the difference between the position of both paragraph

In this way, you can get the row height of a table. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
doc.updatePageLayout();
NodeCollection collection = doc.getChildNodes(NodeType.ROW, true);
for(Row row : (Iterable)collection)
{
if(row.getNextSibling() != null)
{
Paragraph paragraph1 = row.getFirstCell().getFirstParagraph();
Paragraph paragraph2 = ((Row)row.getNextSibling()).getFirstCell().getFirstParagraph();
Object renderObject = layoutCollector.getEntity(paragraph1);
layoutEnumerator.setCurrent(renderObject);
double position1 = layoutEnumerator.getRectangle().getY();
renderObject = layoutCollector.getEntity(paragraph2);
layoutEnumerator.setCurrent(renderObject);
double position2 = layoutEnumerator.getRectangle().getY();
System.out.println(position2 - position1);
}
}

Thanks very much.

  1. I’m not sure of the discrepancy between our output… I still see SHAPE as being NodeType=18 using the 14.10 version of words.

  2. I appreciate your approach. It almost works… of course it fails for the last row in a table. I have developed a work around where we will only use text boxes.

I really appreciate the assistance.

Brian

Hi Brian,

Thanks for your inquiry.

*brian-3:

  1. I’m not sure of the discrepancy between our output… I still see SHAPE as being NodeType=18 using the 14.10 version of words.*

I have checked your document again and have found that there is text box with text ‘@@POL_NUM@@’ in your document. Aspose.Words loads this textbox as DrawingML node (NodeType=34). See the attached DOM image for detail. You can check this Drawing ML node using following code snippet.

public int replacing(ReplacingArgs e) throws Exception {
System.out.println("---------------------------");
Node matchNode = e.getMatchNode();
DocumentBuilder builder = new DocumentBuilder((Document)matchNode.getDocument());
System.out.println("Match node is type " + matchNode.getNodeType());
DrawingML ancestor = (DrawingML)matchNode.getAncestor(NodeType.DRAWING_ML);
if (ancestor != null) {
System.out.println(ancestor.getNodeType());
}
return ReplaceAction.SKIP;
}

*brian-3:

  1. I appreciate your approach. It almost works… of course it fails for the last row in a table. I have developed a work around where we will only use text boxes.*

It is nice to hear from you that your issue has been solved. Please let us know if you have any more queries.