Layout Enumerator error

Background:

we were using the version 21.5.0 of Aspose.Words

i want to get all pages which have node beyond the right indent。

i had get some properties below firstly。

page height:841.9 , page width:595.3 ,left margin:42.5,right margin:42.5.

so,when a node’s X + node’s width > page width -right margin , the node beyond the right indent,is that right?

And i also have some other question.

1.what is cell and row? what is the x and width of the cell and row? Through the log ,I find the width is both 6.674。The height of cell is 0. Now I am confused。Can you show me with an image?

2.In the question 3. I typed about 470 spaces,from the log I found 2 layoutEnumerator Kinds,SPACES and SYMBOL,what does SYMBOL mean?

The 470 spaces looks like to be cut into 4 lines,and the 4 lines are all not beyond the right indent,but they are just 2 lines in the docx,and the line is beyond the right indent obviously,even though beyond the page。Is there something wrong?

3.the issue WORDSNET-24778**,is there any plan to fix it** **?this can help me to ignore some case of beyond indent。

Here are the code and document i used。
Program.zip (2.1 KB)
demo.docx (385.8 KB)

@lkf77081

Yes, it is right.

Layout entity kind CELL and ROW are table cell and table row end symbol respectively. Their Type is Span. For example if traverse the tree of simple table using the code like this:

Document doc = new Document(@"C:\Temp\table.docx");
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
TraverseLayoutForward(enumerator, 1);
/// <summary>
/// Enumerate through layoutEnumerator's layout entity collection front-to-back,
/// in a depth-first manner, and in the "Visual" order.
/// </summary>
private static void TraverseLayoutForward(LayoutEnumerator layoutEnumerator, int depth)
{
    do
    {
        PrintCurrentEntity(layoutEnumerator, depth);

        if (layoutEnumerator.MoveFirstChild())
        {
            TraverseLayoutForward(layoutEnumerator, depth + 1);
            layoutEnumerator.MoveParent();
        }
    } while (layoutEnumerator.MoveNext());
}
/// <summary>
/// Print information about layoutEnumerator's current entity to the console, while indenting the text with tab characters
/// based on its depth relative to the root node that we provided in the constructor LayoutEnumerator instance.
/// The rectangle that we process at the end represents the area and location that the entity takes up in the document.
/// </summary>
private static void PrintCurrentEntity(LayoutEnumerator layoutEnumerator, int indent)
{
    string tabs = new string('\t', indent);

    Console.WriteLine(layoutEnumerator.Kind == string.Empty
        ? $"{tabs}-> Entity type: {layoutEnumerator.Type}"
        : $"{tabs}-> Entity type & kind: {layoutEnumerator.Type}, {layoutEnumerator.Kind}");

    // Only spans can contain text.
    if (layoutEnumerator.Type == LayoutEntityType.Span)
        Console.WriteLine($"{tabs}   Span contents: \"{layoutEnumerator.Text}\"");

    RectangleF leRect = layoutEnumerator.Rectangle;
    Console.WriteLine($"{tabs}   Rectangle dimensions {leRect.Width}x{leRect.Height}, X={leRect.X} Y={leRect.Y}");
    Console.WriteLine($"{tabs}   Page {layoutEnumerator.PageIndex}");
}

You will get the following output for simple table:

-> Entity type: Row
    Rectangle dimensions 467.5x14.428, X=72.25 Y=85.928
    Page 1
        -> Entity type: Cell
            Rectangle dimensions 467.5x14.428, X=72.25 Y=85.928
            Page 1
                -> Entity type: Line
                    Rectangle dimensions 456.7x13.428, X=77.65 Y=86.428
                    Page 1
                        -> Entity type & kind: Span, CELL
                            Span contents: "☼"
                            Rectangle dimensions 5.479x13.428, X=77.65 Y=86.428
                            Page 1
        -> Entity type: Cell
            Rectangle dimensions 0x14.428, X=539.75 Y=85.928
            Page 1
                -> Entity type: Line
                    Rectangle dimensions 0x13.428, X=539.75 Y=86.428
                    Page 1
                        -> Entity type & kind: Span, ROW
                            Span contents: "☼"
                            Rectangle dimensions 5.479x13.428, X=539.75 Y=86.428
                            Page 1

As you can see entities with Type=Span and Kind=CELL or ROW are invisible cell and row end characters.

In your document 470 spaces are mixed spaces and non-bricking spaces. For spaces Aspose.Words returns kind SPACES, for non-bricking spaces - SYMBOL.

The issue WORDSNET-24778 is related to the table grid calculation algorithm implementation task - WORDSNET-832. We are continuously work on making our table grid calculation algorithm as close to MS Word as possible. But since MS Word behavior is not documented and not always obvious it is hard to give promises regarding such things.

PS: In your code you are using Document.UpdateTableLayout method. This method is deprecated and it is not recommended to use it unless you are sure in positive effect of using this method. Calling this method can break table layout.

1 .If the entities with Type=Span and Kind=CELL or ROW are invisible cell and row end characters.
is it means that i can not judge wether the node is beyond the right indent by cell or row?and which one can i use to be judged by, by line? does style (padding or margin)has effect on the value of X,Y,width,height?
2 .For question 2:The 470 spaces looks like to be cut into 4 lines,and the 4 lines are all not beyond the right indent,but they are just 2 lines in the docx,and the line is beyond the right indent obviously,even though beyond the page。Is there something wrong?

3demo.docx (361.5 KB)
image.png (30.1 KB)
with my code ,i find the last row is beyond the right indent on page 1, but i can’t find it in the document.Can you show me?
image.png (209.2 KB)

when i focus on the whole table on page 1,i find something strange ,i marks it at the image。what is that? will it have effect on the width,height,X,Y of Cell or Row?

@lkf77081

If your goal is to detect whether a particular table row goes beyond the page, you can use code like this:

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);

// Get table rows.
NodeCollection rows = doc.GetChildNodes(NodeType.Row, true);
foreach (Row r in rows)
{
    // Skip rows, which are in the document's header/footer.
    // LayoutCollector and LayoutEnumerator work only with nodes in the main document's body.
    if (r.GetAncestor(NodeType.Body) == null)
        continue;

    PageSetup ps = ((Section)r.GetAncestor(NodeType.Section)).PageSetup;
    RectangleF pageRect = new RectangleF(0, 0, (float)ps.PageWidth, (float)ps.PageHeight);

    enumerator.Current = collector.GetEntity(r.LastCell.LastParagraph);
    while (enumerator.Type != LayoutEntityType.Row)
        enumerator.MoveParent();

    // Check whether the row goes beyond the page.
    if (!pageRect.Contains(enumerator.Rectangle))
        Console.WriteLine(enumerator.Rectangle);
}

With this code you can detect which node in Aspose.Words model goes beyond the page. Your code on other hand traverse only document layout and you cannot detect which node in Aspose.Words DOM corresponds the entity in the document layout.

You should use LayoutEntityType.Line for lines.

Yes, Aspose.Words layout engine calculates absolute position of the entities on the page.

You should make an important note here - LayoutCollector and LayoutEnumerator uses the same layout engine, which is used by Aspose.Words to render the document to PDF or any other fixed page formats. So you should compare the output produced by LayoutEnumerator with document rendered to PDF.
If you render your document to PDF, you will see that Aspose.Words wraps the spaces into 4 lines:

This issue is logged as WORDSNET-24849. We will let you know once it is resolved.

You are referring to the invisible row end character. So you can skip it in the detection.

Unfortunately, it is not quite clear what you mean. As I mentioned Aspose.Words returns the size and coordinates of entities calculated by our layout engine. It take in account the properties of nodes set in the document and in most cases mimics MS Word, but since MS Word layout engine is not documented there might be peculiarities.


i have update to the neweast version for Words ,and delete the method “Document.UpdateTableLayout”, but new problem comes,like the attachment,the table is not break ,cross through the page bottom. how can i resolve it? is there any method can be instead of Document.UpdateTableLayout?

@lkf77081 Could you please attach your input and output documents here for our reference? I have checked conversion to PDF using the latest version of Aspose.Words and your initial demo document and cannot reproduce this problem: out.pdf (480.0 KB) ms.pdf (77.7 KB).
As I can see MS Word and Aspose.Words generated PDF documents looks almost the same, except the problem with spaces wrapping on the 4th page, which has been logged as WORDSNET-24849.

UpdateTableLayout method was the first attempt of table grid calculation algorithm implementation. Currently another implementation is used internally by Aspose.Words, so there is no need to call additional methods to calculate table layout.

demo.docx (367.2 KB)
image.png (385.1 KB)
with the version 23.10, i find a new problem,the letter ‘mark)’ looks beyond the right indent, but when add its x and width , it is equals page with - right margin.The letter’s X or width must be wrong.
Something is more strange,It print well ,when it is printed,the letter is not beyond the right indent.
Can you help me to resolve the problem?

@lkf77081 As I already mentioned LayoutCollector and LayoutEnumerator use the same layout engine, which is used by Aspose.Words to render the document to PDF or any other fixed page formats and for printing. So you should compare the output produced by LayoutEnumerator with document rendered to PDF.
If you save your document as PDF suing Aspose.Words, you will see that Aspose.Words renders the table at the end of the second page a little narrower than MS Word and the word "mark)" does not go beyond the page margins.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-24916

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

i have tried as you said ,pdf format is truely right.
but why the letter 'x or width has difference between word and pdf?
will you resolve it? For some special reason ,i have to use word format.

@lkf77081 MS Word documents are flow documents and do not contain any information about document layout. The consumer applications, like MS Word or Open Office builds document layout on the fly. Aspose.Words uses it’s own layout engine to build document layout to render the document to PDF or any other fixed page formats and for printing, LayoutCollector and LayoutEnumerator use the same layout engine. Since MS Word layout engine is not documented and is kind of black box, there might be differences between Aspose.Words and MS Word document layout. But we continuously work on improving our layout engine to make it as close to MS Word as possible.

The issues you have found earlier (filed as WORDSNET-24916) have been fixed in this Aspose.Words for .NET 23.7 update also available on NuGet.

The issues you have found earlier (filed as WORDSNET-24849) have been fixed in this Aspose.Words for .NET 24.3 update also available on NuGet.