Incorrect node width and height

skorpusova · March 17, 2023, 6:50am

Hi Team.

I use Aspose.Words version 23.3.
I need to identify paragraphs position in document. I use the following code:

java.awt.geom.Rectangle2D.Float getPosition(com.aspose.words.Node asposeNode)
{
    LayoutCollector layoutCollector layoutCollector = new LayoutCollector((com.aspose.words.Document)asposeNode.getDocument());
    LayoutEnumerator layoutEnumerator = new LayoutEnumerator(layoutCollector.getDocument());
    Object renderObject = layoutCollector.getEntity(asposeNode);
    layoutEnumerator.setCurrent(renderObject);
    return layoutEnumerator.getRectangle();
}

I test my code for document: https://cloud.mail.ru/public/5cSH/Bif1Fxiyj

I noticed that width and height of paragraphs calculated incorrectly. For example,

document header has weight=10.875, height=27.598,
abstract has weight=4.86, height=10.349
index terms has wight=4.86, height=10.349.
and so on.

Could you explain, what wrong with my code?

Thank you in advince,
Svetlana

alexey.noskov · March 17, 2023, 12:29pm

@skorpusova You should note that layoutEnumerator.getRectangle() does not return the bounding box of the paragraph, it returns bounding box of the paragraph end. If you need to calculate paragraph bounding box, you should use code like the following:

Document doc = new Document("C:\\Temp\\in.docx");

// Wrap paragraphs into bookmarks to be able to calculate bounds of paragraphs.
Iterable<Paragraph> paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
ArrayList<String> paraBookmakrs = new ArrayList<String>();
int i = 0;
for (Paragraph p : paragraphs)
{
    // LayoutCollector and LayoutEnumerator does not work with nodes in header/footer
    if (p.getAncestor(NodeType.HEADER_FOOTER) != null)
        continue;

    String bkName = "tmp_bookmark_" + i;
    paraBookmakrs.add(bkName);
    i++;
    p.prependChild(new BookmarkStart(doc, bkName));
    p.appendChild(new BookmarkEnd(doc, bkName));
}

// Create LayoutCollector and LayoutEnumerator.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

// Calculate bounding box of the first and the last lines of the paragraphs.
for (String tmpBkName : paraBookmakrs)
{
    Bookmark bk = doc.getRange().getBookmarks().get(tmpBkName);

    // Move enumerator to the start of the bookmark.
    enumerator.setCurrent(collector.getEntity(bk.getBookmarkStart()));
    // Move enumerator to the line where the bookmark is located.
    while (enumerator.getType() != LayoutEntityType.LINE)
        enumerator.moveParent();

    // Print bounding box of the first line of the paragraph.
    Rectangle2D rect = enumerator.getRectangle();
    System.out.println("X=" + rect.getX() + "; Y=" + rect.getY() + "; Width=" + rect.getWidth() + "; Height=" + rect.getHeight());

    // Do the same for the bookmark end.
    enumerator.setCurrent(collector.getEntity(bk.getBookmarkEnd()));
    while (enumerator.getType() != LayoutEntityType.LINE)
        enumerator.moveParent();

    rect = enumerator.getRectangle();
    System.out.println("X=" + rect.getX() + "; Y=" + rect.getY() + "; Width=" + rect.getWidth() + "; Height=" + rect.getHeight());

    System.out.println("===============================");
    // If you need to calculate bounding box of the whole paragraph,
    // you can calculate it as union of the start and end lines
    // if the paragraph is located on the same page and the same txt column.
    // Otherwise it will be required to add additional logic to calculate bounding box
    // of paragraphs' lines located on different pages or text columns
}

Also, as you may know, MS Word documents are flow documents and do not contain any information about document layout. The consumer applications, like MS Word or Open Office builds document layout on the fly. Aspose.Words uses it’s own layout engine to build document layout while rendering the document to fixed page formats (PDF, XPS, Image etc.). The same layout engine is used for providing document layout information via LayoutCollector and LayoutEnumerator classes.
To built proper document layout the fonts used in the original document are required. If Aspose.Words cannot find the fonts used in the document the fonts are substituted . This might lead into the layout difference (incorrect coordinate returned by LayoutEnumerator ), since substitution fonts might have different font metrics. You can implement IWarningCallback to get a notification when font substitution is performed.

skorpusova · March 20, 2023, 4:26am

@alexey.noskov , Thank you a lot.
One more question. How can I calculate bounding box of shapes or group shapes? Shape can be located in paragraph but its position on page may be very different. It is very often case for multi column documents when shape is located inside page of the first column (page beggining), but it is displayed on the second column.

alexey.noskov · March 20, 2023, 5:46am

@skorpusova With shapes and group shapes the things are much simpler, since they have fixed dimensions. You can use the following simple code to get bounding box of shape or group shape:

Document doc = new Document("C:\\Temp\\in.docx");
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

// Get shape or group shape.
Shape s = (Shape)doc.getChild(NodeType.SHAPE, 0, true);

// Make sure we work with top level shape and shape is not in header/footer.
if (s.isTopLevel() && s.getAncestor(NodeType.HEADER_FOOTER) == null)
{
    enumerator.setCurrent(collector.getEntity(s));
    Rectangle2D rect = enumerator.getRectangle();
    System.out.println("X=" + rect.getX() + "; Y=" + rect.getY() + "; Width=" + rect.getWidth() + "; Height=" + rect.getHeight());
}

skorpusova · March 23, 2023, 6:51am

Thanks a lot, @alexey.noskov.
It really helped me. But I have a few questions more.
How should I find bounding box correctly for tables?
You provided the code if shape is top level.
Sometime people wrapped images inside a few shapes. How should I get bounding box for inner shapes or paragraph inside shape ?

alexey.noskov · March 23, 2023, 6:52am

@skorpusova You can use the following code to calculate table bounding box:

// Open document
Document doc = new Document("C:\\Temp\\in.docx");

// Create LayoutCollector and LayoutEnumerator classes to get layout information of nodes.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

// Calculate bounding boxes of table in the document.
Iterable<Table> tables = doc.getChildNodes(NodeType.TABLE, true);
for (Table t : tables)
{
    // Skip tables which are in header footer(LayoutCollector and LayoutEnumerator classes do not work with header/footer nodes)
    if (t.getAncestor(NodeType.HEADER_FOOTER) != null)
        continue;

    // Move LayoutEnumerator to the first row
    enumerator.setCurrent(collector.getEntity(t.getFirstRow().getFirstCell().getFirstParagraph()));
    while (enumerator.getType() != LayoutEntityType.ROW)
        enumerator.moveParent();

    //Get rectangle of the first row of the table.
    Rectangle2D first_rect = enumerator.getRectangle();

    // Do the same with last row
    enumerator.setCurrent(collector.getEntity(t.getLastRow().getFirstCell().getFirstParagraph()));
    while (enumerator.getType() != LayoutEntityType.ROW)
        enumerator.moveParent();

    // Get rectangle of the last row in the table.
    Rectangle2D last_rect = enumerator.getRectangle();
    // Union of the rectangles is the bounding box of the table.
    Rectangle2D result_rect = first_rect.createUnion(last_rect);

    System.out.println("Table rectangle : x=" + result_rect.getX() + ", y=" + result_rect.getY() + ", width=" + result_rect.getWidth() + ", height=" + result_rect.getHeight());
}

Please note, the code is simplified to demonstrate the basic technique and converts only tables placed on a single page. In MS Word tables can span more than one page.

skorpusova · March 23, 2023, 6:53am

@alexey.noskov, Thank you a lot for your help. It was very useful.

skorpusova · March 23, 2023, 6:54am

@alexey.noskov, Could you clarify one more examle?
I have applyed the algoritm that you provided above. Paragraph boundings are correct. But Shape of Fig. 1 on page 5 has very strange coordinates: x = 324.15, y = 0.0.
It is very strange because image is on the left side of the page. Could you explan this example? May be I should take into account something else?

alexey.noskov · March 23, 2023, 6:55am

@skorpusova I cannot reproduce the problem on my side. I have used the following simple code for testing:

Document doc = new Document("C:\\Temp\\in.doc");
doc.setWarningCallback(new FontSubstitutionWarningCollector());

LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape s : shapes)
{
    // Make sure we work with top level shape and shape is not in header/footer.
    if (s.isTopLevel() && s.getAncestor(NodeType.HEADER_FOOTER) == null)
    {
        enumerator.setCurrent(collector.getEntity(s));
        Rectangle2D rect = enumerator.getRectangle();
        System.out.println("Page: " + enumerator.getPageIndex() + "\tX=" + rect.getX() + "; Y=" + rect.getY() + "; Width=" + rect.getWidth() + "; Height=" + rect.getHeight());
    }
}
doc.save("C:\\Temp\\out.pdf");

skorpusova · March 23, 2023, 6:56am

I have completed your code and got another values for first shape on page 5.

Page: 4	X=536.39697265625; Y=605.4639892578125; Width=6.0; Height=14.913000106811523
Page: 4	X=494.0639953613281; Y=620.9509887695312; Width=5.25; Height=14.913000106811523
Page: 4	X=366.9779968261719; Y=660.583984375; Width=7.5; Height=14.913000106811523
Page: 4	X=394.87200927734375; Y=660.583984375; Width=8.25; Height=14.913000106811523
**Page: 5	X=324.1499938964844; Y=0.0; Width=248.25; Height=133.3000030517578**
Page: 5	X=439.4750061035156; Y=612.572998046875; Width=6.75; Height=14.913000106811523
Page: 5	X=390.0249938964844; Y=652.2059936523438; Width=119.25; Height=36.0
Page: 5	X=362.71600341796875; Y=700.85302734375; Width=6.75; Height=14.913000106811523
**Page: 6	X=112.6709976196289; Y=720.4949951171875; Width=124.05799865722656; Height=21.104999542236328**
**Page: 6	X=117.427001953125; Y=50.400001525878906; Width=33.0; Height=14.913000106811523**
**Page: 6	X=281.9930114746094; Y=50.400001525878906; Width=6.0; Height=14.913000106811523**
**Page: 6	X=175.3780059814453; Y=395.5929870605469; Width=17.25; Height=16.413000106811523**
Page: 6	X=51.900001525878906; Y=79.8239974975586; Width=511.79998779296875; Height=279.54998779296875
Page: 6	X=81.55000305175781; Y=363.27398681640625; Width=459.75; Height=20.700000762939453
Page: 6	X=56.900001525878906; Y=436.72601318359375; Width=17.25; Height=16.413000106811523
Page: 6	X=145.30799865722656; Y=658.9539794921875; Width=19.5; Height=14.913000106811523
Page: 6	X=130.55799865722656; Y=674.4409790039062; Width=47.25; Height=16.413000106811523
Page: 6	X=276.927001953125; Y=691.427978515625; Width=6.75; Height=14.913000106811523
Page: 6	X=420.27398681640625; Y=396.0469970703125; Width=6.75; Height=14.913000106811523
Page: 7	X=147.8470001220703; Y=74.5459976196289; Width=6.75; Height=14.913000106811523
Page: 7	X=170.5760040283203; Y=102.10600280761719; Width=6.75; Height=14.913000106811523
Page: 7	X=47.29999923706055; Y=414.5950012207031; Width=269.20001220703125; Height=224.4499969482422
Page: 7	X=320.75; Y=136.54200744628906; Width=270.29998779296875; Height=173.60000610351562
Page: 10	X=53.90599822998047; Y=128.89700317382812; Width=504.18701171875; Height=391.8559875488281

How can it be explained ? I have True Fonts and installed Word.

alexey.noskov · March 23, 2023, 6:57am

@skorpusova Could you please check whether IWarningCallback warns about font substitution on your side? Which version of Aspose.Words do you use for testing? I have used the latest 23.3 version of Aspose.Words for Java.

skorpusova · March 23, 2023, 6:58am

No fons subtitutions was used. I use version 22.4. I cannot use version 23.3 due to performance issue WORDSNET-25169 . Do you think there is a difference in version?

alexey.noskov · March 23, 2023, 6:58am

@skorpusova Yes, using the 22.4 version I get the result the same as yours.

skorpusova · March 23, 2023, 6:59am

Thank you a lot. Will wait for WORDSNET-25169 fix