Splitting tables

PatrickVB · May 13, 2018, 12:20pm

Dear Team,

In the attached document there are a number of tables which are larger than a single page. I would like to split these tables into tables which do fit on a single page.

The target is that each of these tables (which are fitting on one page) are then converted into TIFF images.
The logic for converting into TIFF images I already have (from previous post). Now I would like to understand how I can split the table in smaller tables each fitting on a single page.

Many thanks for your advice

Patrick

Tables.zip (45.2 KB)

awais.hafeez · May 14, 2018, 12:47am

@PatrickVB,

We are working on your query and will get back to you soon.

PatrickVB · May 16, 2018, 7:31am

Hi Awais,

Is there any update on this issue?
Many thanks.

Regards
Patrick

awais.hafeez · May 16, 2018, 3:12pm

@PatrickVB,

Thanks for being patient. Please also ZIP and attach an expected Word document (not .tiff) showing the desired output here for our reference. Please create this expected Word document by using MS Word. We will then provide you code to achieve the same by using Aspose.Words. Thanks for your cooperation.

PatrickVB · May 16, 2018, 3:59pm

Dear Awais,

In attachment you can find the outputs.

What I was thinking is the following

For every table create a new aspose document instance
Iterate over the table (in the new document) to detect where the table crosses the pages
Where the table is crossing the page border, split the table.

This way I have split the each single table instance in the original document into multiple tables each fitting on a page.

The problem is that I do not know how I can detect where the table rows cross the page boundary.

What I did in word is very simple.
I copies evers single table to a separate document.
Then visually I see where the table goes to the next page.
Move to the row and then instructt word to split the table at that row
So then the single table has become more tables.
And each table fits on a single page.
As such also the TIFF image of that table will fit on a single page.

For the second table, it was a bit more tricky, That table has a repeating header.
So I did split the table as described above and then introduced the header

I hope the attached documents (3, 1 document for each table) explain sufficiently what I want to achieve.

KR

Patrick

TablesExpectedOutput.zip (127.8 KB)

awais.hafeez · May 17, 2018, 3:00am

@PatrickVB,

For each Table that spans across more than one Page, the following code will create a new Document containing only that Table.

Document doc = new Document(MyDir + @"Tables\Tables.docx");

Document tempDoc = (Document)doc.Clone(true);
tempDoc.FirstSection.Body.RemoveAllChildren();

NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
for (int i =0;  i< tables.Count; i++)
{
    Table table = (Table)tables[i];

    tempDoc.FirstSection.Body.AppendChild(tempDoc.ImportNode(table, true));
    Table importedTable = tempDoc.FirstSection.Body.Tables[0];

    LayoutCollector collector = new LayoutCollector(tempDoc);                

    int startPage = collector.GetStartPageIndex(importedTable.FirstRow.FirstCell.FirstParagraph);
    int endPage = collector.GetEndPageIndex(importedTable.LastRow.LastCell.LastParagraph);
    if (endPage > startPage)
    {
        tempDoc.Save(MyDir + @"Tables\Table_" + i + ".docx");                   
    }

    tempDoc.FirstSection.Body.RemoveAllChildren();
}

We are working further on your query to Split one Table into Multiple Tables. We will get back to you with more code soon.

PatrickVB · May 17, 2018, 7:04am

Hi Awais,

Thank you very much.
I’m looking forward for the splitting code.

KR

Patrick

awais.hafeez · May 17, 2018, 12:21pm

@PatrickVB,

You can build on the following code to meet your requirements. (see outputs.zip (70.5 KB))

Document doc = new Document(MyDir + @"Tables\Tables.docx");

Document tempDoc = (Document)doc.Clone(true);
tempDoc.FirstSection.Body.RemoveAllChildren();
            
NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
for (int i = 0; i < tables.Count; i++)
{
    Table table = (Table)tables[i];
    tempDoc.FirstSection.Body.AppendChild(tempDoc.ImportNode(table, true));
    Table importedTable = tempDoc.FirstSection.Body.Tables[0];

    ArrayList splitIndices = new ArrayList();
    int importedRowCount = importedTable.Rows.Count;

    LayoutCollector collector = new LayoutCollector(tempDoc);
    int startPage = collector.GetStartPageIndex(importedTable.FirstRow.FirstCell.FirstParagraph);
    int endPage = collector.GetEndPageIndex(importedTable.LastRow.LastCell.LastParagraph);

    if (endPage > startPage)
    {
        int startRow = startPage;
        for (int x = 0; x < importedRowCount; x++)
        {
            Row row = importedTable.Rows[x];
            int endRow = collector.GetEndPageIndex(row.LastCell.LastParagraph);

            if (endRow > startRow)
            {
                splitIndices.Add(x);
                startRow = endRow;
            }
        }

        splitIndices.Add(importedRowCount);

        for (int x = splitIndices.Count - 1; x > 0; x--)
        {
            SplitTable(importedTable, (int)splitIndices[x - 1], (int)splitIndices[x]);
        }

        for (int x = (int)splitIndices[0]; x < importedRowCount; x++)
        {
            importedTable.LastRow.Remove();
        }

        tempDoc.Save(MyDir + @"Tables\Table_" + i + ".docx");
    }

    tempDoc.FirstSection.Body.RemoveAllChildren();
}
////////////////////////////////////////////////
public static Table SplitTable(Table table, int startIndex, int endIndex)
{
    Table newTable = (Table)table.Clone(true);
    table.ParentNode.InsertAfter(newTable, table);

    for (int i = 0; i < startIndex; i++)
    {
        newTable.FirstRow.Remove();
    }

    for (int i = endIndex; i < table.Rows.Count; i++)
    {
        newTable.LastRow.Remove();
    }

    if (table.FirstRow.RowFormat.HeadingFormat)
    {
        Row headingRow = (Row)table.FirstRow.Clone(true);
        newTable.InsertBefore(headingRow, newTable.FirstRow);
    }

    Paragraph separator = new Paragraph(table.Document);
    table.ParentNode.InsertAfter(separator, table);

    return newTable;
}

PatrickVB · May 17, 2018, 12:40pm

Hi Awais,

Thank you very much for this code.
Sorry for asking a question before having even tried it, but does this code also take of the scenario as we have it with the second table.

That table has repeating headers. This header should be present in each of the individual tables.

Many thanks
Patrick

awais.hafeez · May 18, 2018, 12:01am

@PatrickVB,

Yes, it does add headers to all individual tables.

PatrickVB · May 19, 2018, 1:24pm

Hi Awais,

I just wonder if it is not saver/easier to create a new Document instead of cloning.
In the above code sample, you only remove the nodes for the first section. If the document would have more sections the result would not be correct.

What was the reason for the clone? Was this to ensure that certain settings are kept (eg page size etc)?
Is this a general best practices when dealing with operations on Documents, to clone them and then to remove what is not relevant?

Many thanks for the support.

Regards

Patrick

PatrickVB · May 19, 2018, 3:23pm

Hi Awais,

THe solution provided works perfect. Thanks a lot for the support.

Patrick

awais.hafeez · May 19, 2018, 10:00pm

@PatrickVB,

For this case, you will have to keep only the section that contains the target table. And remove the other sections above and below that particular section.

Your understanding is correct. This is just to preserve settings e.g. styles, themes, page setup, orientation info etc.

In case you have further queries or need any help, please let us know.