How to insert paragraph before Table continuation (with HeadingFormat)

Remus87 · November 30, 2022, 10:11am

Hi !
We’re trying to achieve same, to add paragraph on the start of each page before table continuation. Tried the code snippet provided as solution in the thread (on C#) and works fine as long as last rows are not bigger than a line.
If the last row is bigger than a line and/or populated page rows of the table are not equal (there is a bigger space discrepancy between pages) than the paragraph will not position correct at the start of each page, and is more randomly positioning on each page.
Tried for a while now (trial and error by adding manually line feeds on random rows) to fix these but seems i cannot find a pattern why this occurs.
Tried also with another table sample where we keep rows from breaking across pages and getting closer to the solution, but still sometimes places the paragraph at the end of page.
Attached below the 2 samples -> with output for each:

TableSample.odt -> outputs out.odt
TableSample_KeepRowsFromBreaking.odt -> out_keepRows.odt
Samples_InputOutput.zip (65.5 KB)

alexey.noskov · November 30, 2022, 12:15pm

@Remus87 In your case you can use LayoutCollector.GetEndPageIndex instead of LayoutCollector.GetStartPageIndex to determine that row goes to the next page. Also, you can specify ParagraphFormat.PageBreakBefore option for breaking paraph to make sure it always goes to the next page. Here is the modified code:

private static Table SplitTalbe(Table table, LayoutCollector collector)
{
    int startPageIndex = collector.GetStartPageIndex(table.FirstRow);

    int breakIndex = -1;
    int firstDataRowIndex = -1;
    // Determine index of row where page breaks. And index of the first data row.
    for (int i = 1; i < table.Rows.Count; i++)
    {
        Row r = table.Rows[i];
        if (!r.RowFormat.HeadingFormat && firstDataRowIndex < 0)
            firstDataRowIndex = i;

        int rowPageIndex = collector.GetEndPageIndex(r);
        if (rowPageIndex > startPageIndex)
        {
            breakIndex = i;
            break;
        }
    }

    if (breakIndex > 0)
    {
        Table clone = (Table)table.Clone(true);

        // Insert a cloned table after the main table.
        Paragraph para = new Paragraph(table.Document);
        para.ParagraphFormat.PageBreakBefore = true;
        para.AppendChild(new Run(table.Document, "Continuation of the table"));

        table.ParentNode.InsertAfter(para, table);
        para.ParentNode.InsertAfter(clone, para);

        // Remove content after the breaking row from the main table.
        while (table.Rows.Count > breakIndex)
            table.LastRow.Remove();

        // Remove rows before the breaking row from the clonned table.
        for (int i = 1; i < breakIndex; i++)
            clone.Rows.RemoveAt(firstDataRowIndex);

        return clone;
    }

    return null;
}

Remus87 · November 30, 2022, 1:06pm

Thank you Alexey,
Works like a charm.

Remus87 · November 30, 2022, 5:03pm

Another query on the same thread:
We’ll usually have multiple tables on the document. On the previous sample was just a table, and tried to develop from there for multiple ones, but the issue that i encounter is when looping through the collection of tables, and pick the ones that have more than page to be processed it will break at some point (maybe because we explicitly set the property to true) and will leave few pages from table with a single row. Please see the sample and the output attached (on page 7 and 9 it breaks and shouldn’t)
The code to loop through table collection is:

for (int i = tables.Count - 1; i > -1; i--)
{
    Table table = (Table)tables[i];
    if (collector.GetNumPagesSpanned(table) > 0)
    {
        while (table != null)
        {
            table = SplitTalbe(table, collector);
            collector.Clear();
            doc.UpdatePageLayout();
        }
    }
}

I tried explicitly to add functionality only on the table that spans on multiple pages and still same behaviour:

Table tblLast = (Table)tables[2];
while (tblLast != null)
{
    tblLast = SplitTalbe(tblLast, collector);
    collector.Clear();
    doc.UpdatePageLayout();
}

Mention that when copy that table into a new separate .odt and run the supplied code it works perfectly fine. So I’mSamples InputOutput2.zip (29.2 KB)
thinking it might just need a refresh/ reset of the document properties somewhere

alexey.noskov · November 30, 2022, 7:03pm

@Remus87 I have tested with the following code and the result is correct:

Document doc = new Document(@"C:\Temp\Sample_MultipleTables.odt");
LayoutCollector collector = new LayoutCollector(doc);

NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
foreach (Table t in tables)
{
    if (t.ParentNode.NodeType != NodeType.Body)
        continue;

    Table table = t;
    if (collector.GetNumPagesSpanned(table) > 0)
    {
        while (table != null)
        {
            table = SplitTalbe(table, collector);
            collector.Clear();
            doc.UpdatePageLayout();
        }
    }
}

doc.Save(@"C:\Temp\out.odt");

out.zip (16.1 KB)

Remus87 · December 1, 2022, 1:44pm

Indeed if we’ll make it as a separate document (as the sample provided) works as expected. We’ve currently added the functionality in continuation of the present processing, before saving the document, so probably there is a property/setting that’s triggering the previous mentioned behaviour. Probably resetting the document to the default settings (but don’t know how on Aspose.Words eg: in Aspose.Cells will just declare workbook = null) will do it.
Found an alternative:

Saving the document;
Wrap the displayed snipped into a method returning a Document (and pass the filename) . Therefore a new Document instance is created and works ok.

The main concern found in this mechanism is the performance, as we can expect at times to generate documents with hundreds of pages and dozens of tables as well. Attached in the thread a new sample with more than 200 pages where found the following:
Added a stopwatch on:
point 1 (all the pre-processing and saving 1st document) → took around 4 seconds which is good (therefore quick load and processing around DOM, as we’ve applying several operations)
point 2 (adding the mentioned method: paragraph as title (Schedule continued) at the beginning of the page on each spanning table) → took around 250 seconds which is not feasible.
Would be great if can provide an alternative solution with good performance.
Noticed from documentation (Document.UpdatePageLayout | Aspose.Words for .NET) that rebuilding the layout by calling UpdatePageLayout() can consume a big amount of time as renders each time, so probably if this can be avoided, or using sort of a callback method (or even async) should improve considerably the time.
Sample3.zip (226.9 KB)

Thanks again,
Remus

alexey.noskov · December 1, 2022, 6:59pm

@Remus87 You are right Document.UpdatePageLayout is quite time and resource consuming operation. Unfortunately, it cannot be avoided in this case. As you know MS Word as well as Open Office documents are flow document, i.e. they does not contain any information about the document layout. Each time you edit the document it’s layout is reflowed. When you split the table into parts and insert a paragraph between them the document layout changes and to detect where the next table should be split the document layout must be updated, otherwise the next table might be split in incorrect location. So unfortunately, there is no more performant way to achieve this.