Layout APIs return Incorrect Position of Row Text using Java

Hi,

I have written some code based on your samples to output the absolute x and y position of row text in a word docs. The code is shown below. The issue I am facing is that the y position output for the row text is incorrect

import com.aspose.words.Document;
import com.aspose.words.LayoutCollector;
import com.aspose.words.LayoutEnumerator;
import com.aspose.words.Node;
import com.aspose.words.NodeCollection;
import com.aspose.words.NodeType;
import com.aspose.words.Row;
import com.aspose.words.Section;
import com.aspose.words.Shape;

public class asposeMatcher {

public static LayoutCollector collector;
public static LayoutEnumerator enumerator;

public static void main(String[] args) throws Exception {
    //final Document doc = new Document(System.getProperty("user.dir") + "/first_page.docx");   
    final Document doc = new Document(System.getProperty("user.dir") + "/current_settings.docx");
    collector = new LayoutCollector(doc);
    enumerator = new LayoutEnumerator(doc);
    
    Section[] secColl = doc.getSections().toArray();
    
    for (Section section: secColl)
    {
        System.out.println(containsShapesAndTable(section));
        if (containsShapesAndTable(section))
        {
            outputShapesAndRowPosititions(section);
        }
    }
    
}

public static boolean containsShapesAndTable(Section section)
{
    NodeCollection ndPara = section.getChildNodes(NodeType.PARAGRAPH, true);
    NodeCollection ndShape = section.getChildNodes(NodeType.SHAPE, true);
    NodeCollection ndTable = section.getChildNodes(NodeType.TABLE, true);
    
    //System.out.println(ndPara.getCount());
    System.out.println(ndShape.getCount());
    System.out.println(ndTable.getCount());
    
    if (ndPara.getCount() > 0 && ndTable.getCount() > 0)
    {
        return true;
    }
    return false;
}

public static void outputShapesAndRowPosititions(Section section)
{
    NodeCollection ndShapes = section.getChildNodes(NodeType.SHAPE, true);
    NodeCollection ndRows = section.getChildNodes(NodeType.ROW, true);
    
    //shapes
    for (Object ndShape : ndShapes)
    {
        Shape nodeShape = (Shape) ndShape;
        try {
            outputPositionInfo ((Node) ndShape);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println(nodeShape.getText());
    }
    
  //rows
    for (Object ndRow : ndRows)
    {
        Row nodeRow = (Row) ndRow;
        try {
            outputPositionInfo ((Node) ndRow);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println(nodeRow.getText());
    }            
}

public static void outputPositionInfo (Node objToCheck) throws Exception {
    enumerator.setCurrent(collector.getEntity(objToCheck));

    String left = String.format("%.2f", enumerator.getRectangle().getX());
    String top = String.format("%.2f", enumerator.getRectangle().getY());
    String width = String.format("%.2f", enumerator.getRectangle().getWidth());
    String height = String.format("%.2f", enumerator.getRectangle().getHeight());

    System.out.print("[(x, y) = (" + left + ", " + top + ")]");
    System.out.println(" AND [(width, height) = (" + width + ", " + height + ")]");
}

}

More analysis, it appears to me that the Evaluation version of Aspose is creating extra output that may be shifting everything down, see output below:

-> Entity type & kind: 1,
Rectangle dimensions 596.8x843.45, X=0 Y=0
Page 1
-> Entity type & kind: 1,
Rectangle dimensions 596.8x843.45, X=0 Y=0
Page 1
-> Entity type & kind: 2,
Rectangle dimensions 72x137.242, X=456.4 Y=84
Page 1
-> Entity type & kind: 32,
Rectangle dimensions 72x13.799, X=456.4 Y=84
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “Evaluation”
Rectangle dimensions 56.016x13.799, X=456.4 Y=84
Page 1
-> Entity type & kind: 64, SPACES
Span contents: " "
Rectangle dimensions 3x13.799, X=512.416 Y=84
Page 1
-> Entity type & kind: 32,
Rectangle dimensions 72x13.799, X=456.4 Y=97.799
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “Only.”
Rectangle dimensions 28.342x13.799, X=456.4 Y=97.799
Page 1
-> Entity type & kind: 64, SPACES
Span contents: " "
Rectangle dimensions 3x13.799, X=484.742 Y=97.799
Page 1
-> Entity type & kind: 32,
Rectangle dimensions 72x13.799, X=456.4 Y=111.598
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “Created”
Rectangle dimensions 41.314x13.799, X=456.4 Y=111.598
Page 1
-> Entity type & kind: 64, SPACES
Span contents: " "
Rectangle dimensions 3x13.799, X=497.714 Y=111.598
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “with”
Rectangle dimensions 22.67x13.799, X=500.714 Y=111.598
Page 1
-> Entity type & kind: 64, SPACES
Span contents: " "
Rectangle dimensions 3x13.799, X=523.384 Y=111.598
Page 1
-> Entity type & kind: 32,
Rectangle dimensions 72x13.799, X=456.4 Y=125.397
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “Aspose.Word”
Rectangle dimensions 69.006x13.799, X=456.4 Y=125.397
Page 1

I’m wondering how I get round this behaviour to evaluate the right values for the Y dimension are being output?

@chrisbeecham

LayoutCollector.GetEntity method works for only Paragraph nodes, as well as indivisible inline nodes, e.g. BookmarkStart or Shape. It doesn’t work for Run, CellRow or Table nodes, and nodes within header/footer. LayoutCollector.GetEntity Method

If you need to navigate to a Run of text then you can insert bookmark right before it and then navigate to the bookmark instead.

If you still face problem, please ZIP and attach your input Word document and expected output here for testing. We will investigate the issue and provide you more information on it.

Hi,

The issue I think I have is Evaluation version of aspose puts some Evaluation text in the docx which shifts everything down (see below) - is there any way to get round this behaviour as its making it hard to me to evaluate the suitability of your product? Otherwise I can provide a zip - I’ll have to work out how to do this and check whether I can share the word doc, but I presume you’re aware of the behaviour of the evaluation version of your component?

-> Entity type & kind: 1,
Rectangle dimensions 596.8x843.45, X=0 Y=0
Page 1
-> Entity type & kind: 1,
Rectangle dimensions 596.8x843.45, X=0 Y=0
Page 1
-> Entity type & kind: 2,
Rectangle dimensions 72x137.242, X=456.4 Y=84
Page 1
-> Entity type & kind: 32,
Rectangle dimensions 72x13.799, X=456.4 Y=84
Page 1
-> Entity type & kind: 64, TEXT
Span contents: “Evaluation”
Rectangle dimensions 56.016x13.799, X=456.4 Y=84
Page 1
-> Entity type & kind: 64, SPACES
Span contents: " "
Rectangle dimensions 3x13.799, X=512.416 Y=84
Page 1

@chrisbeecham

Please get the 30 days temporary license and apply it before importing document into Aspose.Words’ DOM. Please read the following article about applying license.

Applying a License

Thank you, the 30 day temporary licence does indeed appear to return the correct information. I have 2 further questions:

  1. You say that LayoutCollector.GetEntity method works for only Paragraph nodes, is the following code problematic when the Node passed is a Row object (it seems to provide the correct results in my demo:

public static void updatePositionInfo (Node objToCheck, MatchedElement matchedShape) throws Exception {
enumerator.setCurrent(collector.getEntity(objToCheck));

    String left = String.format("%.2f", enumerator.getRectangle().getX());
    String top = String.format("%.2f", enumerator.getRectangle().getY());
    String width = String.format("%.2f", enumerator.getRectangle().getWidth());
    String height = String.format("%.2f", enumerator.getRectangle().getHeight());
    
    matchedShape.x = Float.parseFloat(left);
    matchedShape.y = Float.parseFloat(top);
    matchedShape.width = Float.parseFloat(width);
    matchedShape.height = Float.parseFloat(height);
  1. Is it possible in code to determine the page number of the Node, I am worried about false positives being matched where the elements are vertically aligned and in the same section but on different pages.

Thanks for your help!

Chris

@chrisbeecham

If you need to navigate to a Cell node then you can move to a Paragraph node in this cell and then ascend to a parent entity. The same approach can be used for Row and Table nodes. Please check the moveXXX methods of LayoutEnumerator class.

You can use LayoutCollector.GetStartPageIndex method to get the page number where node begins and LayoutCollector.GetEndPageIndex method to get the page number where node ends.

1 Like

Hi, I have hit a new problem in the layout enumerator and have created a zip file to show the issue. You can see in the documentxxxx_asset_management.docx on p6, Landlord’s Surveyor and the text “a surveyor or member of a firm etc” look horizontally aligned but using the Aspose tool I get the following info about the alignment:

Landlord’s Surveyor

[(x, y) = (232.5, 534.61)] AND [(width, height) = (4.73, 10.92

a surveyor or member of a firm of surveyors who shall be a fellow or associate of the Royal Institution of Chartered Surveyors or the Incorporated Society of Valuers and Auctioneers or suitably experienced and such surveyor may be a person employed by the Landlord or a company which is a Group Company;

[(x, y) = (364.38, 619.5)] AND [(width, height) = (5.28, 10.92)


[ZIP file below deleted for privacy reasons - can supply it directly directly if needed for investigation by Aspose]

@chrisbeecham

We are working over your query and will get back to you soon.

@chrisbeecham

You are facing the expected behavior of Aspose.Words. You can use the following code example to get the height and width of paragraph. Hope this helps you.

Document doc = new Document(MyDir + "rubicon_asset_management.docx");

String str = "a surveyor or member of a firm of surveyors who shall be a fellow or associate of the Royal Institution of Chartered Surveyors or the Incorporated Society of Valuers and Auctioneers or suitably experienced and such surveyor may be a person employed by the Landlord or a company which is a Group Company";
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if(para.toString(SaveFormat.TEXT).contains(str))
    {
        LayoutCollector collector = new LayoutCollector(doc);
        LayoutEnumerator enumerator = new LayoutEnumerator(doc);
        enumerator.setCurrent(collector.getEntity(para));

        enumerator.moveParent();  // move to container line
        System.out.println("Para width = " + enumerator.getRectangle().getWidth());

        double bottom = enumerator.getRectangle().getY() + enumerator.getRectangle().getHeight();
        while (enumerator.movePrevious())
        {
        }  // move to the first line
        double top = enumerator.getRectangle().getY();
        System.out.println("Para height = " + (bottom - top));

        break;
    }
}

Thanks for the code sample. Unfortunately I am still having problems with the enumerator locating the correct x and y position. I added bookmarks to the word document before the paragraphs containing the text Landlord’s Surveyor and a surveyor or member of a firm of surveyors etc. The x and y reported by the bookmarks is:

landlordSurveyorLH
X : 150.9499969482422
Y : 534.6090087890625

landlordSurveyor
X : 286.79998779296875
Y : 534.0989990234375

With the enumerator I get the following outputs:

Text = Landlord’s Surveyor

Para width = 86.6500015258789
Para height = 10.850000381469727
x1 = 150.9499969482422
y1 = 503.04998779296875
x2 = 150.9499969482422
y2 = 415.1499938964844
Page start = 5
Page end = 5

Text = a surveyor or member of a firm of surveyors who shall be a fellow or associate of the Royal Institution of Chartered Surveyors or the Incorporated Society of Valuers and Auctioneers or suitably experienced and such surveyor may be a person employed by the Landlord or a company which is a Group Company;

Para width = 200.0
Para height = 12.050000190734863
x1 = 286.79998779296875
y1 = 618.6500244140625
x2 = 286.79998779296875
y2 = 295.45001220703125
Page start = 5
Page end = 5

I attach a zip

aspose_eclipse2.zip (320.4 KB)

@chrisbeecham

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-21776. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@chrisbeecham

It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSNET-21776) as ‘Not a Bug’.

Please use the latest version of Aspose.Words for Java 21.6 and following code example to get the position of bookmark and text.

public static void ExportLayoutContent(Document doc) throws Exception
{
    LayoutCollector collector = new LayoutCollector(doc);
    LayoutEnumerator enumerator = new LayoutEnumerator(doc);

    enumerator.reset();
    while (true)
    {
        WriteCurrent(enumerator);
        if (!enumerator.moveNext())
            break;
    }
}

private static void WriteCurrent(LayoutEnumerator e) throws Exception
{
	System.out.println(e.getType() + " (" + e.getKind() + ")\t" + e.getRectangle() + "\t" + (e.getType() == LayoutEntityType.SPAN ? e.getText() : ""));
	
    Object current = e.getCurrent();
    if (e.moveFirstChild())
    {
        do
        {
            WriteCurrent(e);
        }
        while (e.moveNext());
        e.moveParent();
    }
}
Document doc = new Document(MyDir + "rubicon_bookmarked.docx");
ExportLayoutContent(doc);

Following is the output of code example. The position of Bookmark and Text is same.

(BOOKMARKSTART) java.awt.geom.Rectangle2D$Float[x=286.8,y=534.099,w=0.0,h=10.924] null
(BOOKMARKEND) java.awt.geom.Rectangle2D$Float[x=286.8,y=534.099,w=0.0,h=10.924] null
(TEXT) java.awt.geom.Rectangle2D$Float[x=286.8,y=534.099,w=5.283,h=10.924] a