Multipage table with repeated headers not splitting into multi tables

Hello all,

We’re using Aspose.Words .Net version 21.7 with valid license to generate word document and also we’re using the same framework to generate PDF document as well.

We tried to generate a multi page table with repeating headers in each page, And while checking the tagging structure it created a single table with repeated header spanning across multiple pages. And weirdly the repeated header in next page is tagged as “th” which is placed in between the single table. Because of this, screen readers doesn’t able to read the table content properly.

And when I tried to create a multi page table in word and saved as PDF manually using Microsoft word, it converted the single table to multiple split tables for each pages with repeated headers but the same is not occurring when we generate PDF using Aspose.words.

Could you please assist me in resolving this.

@Jagan2021 Thank you for reporting this problem to us. For a sake of correction it has been logged as WORDSNET-24027. We will keep you updated and let you know once it is resolved or we have more information for you.
As a temporary workaround, you can split the table using LayoutCollector. Please see the following code:

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector collector = new LayoutCollector(doc);

// Get the table that needs to be stlit in parts.
Table table = doc.FirstSection.Body.Tables[0];

while (table != null)
{
    table = SplitTalbe(table, collector);
    collector.Clear();
    doc.UpdatePageLayout();
}

PdfSaveOptions opt = new PdfSaveOptions();
opt.ExportDocumentStructure = true;
doc.Save(@"C:\Temp\out.pdf", opt);
private static Table SplitTalbe(Table table, LayoutCollector collector)
{
    int startPageIndex = collector.GetStartPageIndex(table.FirstRow);

    int breakIndex = -1;
    int firstDataRowIndex = -1;
    // Determine index of row where page breaks. And index of the first data row.
    for (int i = 1; i < table.Rows.Count; i++)
    {
        Row r = table.Rows[i];
        if (!r.RowFormat.HeadingFormat && firstDataRowIndex < 0)
            firstDataRowIndex = i;

        int rowPageIndex = collector.GetStartPageIndex(r);
        if (rowPageIndex > startPageIndex)
        {
            breakIndex = i;
            break;
        }
    }

    if (breakIndex > 0)
    {
        Table clone = (Table)table.Clone(true);

        // Insert a cloned table after the main table.
        Paragraph para = new Paragraph(table.Document);

        table.ParentNode.InsertAfter(para, table);
        para.ParentNode.InsertAfter(clone, para);

        // Remove content after the breaking row from the main table.
        while (table.Rows.Count > breakIndex)
            table.LastRow.Remove();

        // Remove rows before the breaking row from the clonned table.
        for (int i = 1; i < breakIndex; i++)
            clone.Rows.RemoveAt(firstDataRowIndex);

        return clone;
    }

    return null;
}

@Jagan2021 Using single “Table” tag for a table spanned over several pages is a correct way of tagging. Here is the quote from “Tagged PDF Best Practice Guide: Syntax”:

tables spanning multiple pages are structured as a single table. cells in repeated header rows or columns (e.g., in the case of tables that span multiple pages) are marked as artifacts.

Also I’m not quite sure what you are meaning by “repeated header in next page is tagged as “th” which is placed in between the single table”. The repeated headers on the second and subsequent pages are marked as artifacts and not included into the logical structure in the output of the latest Aspose.Words version. Could you please provide more info about this issue or attach a problematic document here?