How to insert paragraph before Table continuation (with HeadingFormat)

Hello!

in standarts of our document we have to enumerate tables like
‘table 1 - its my table’
‘table 2 - my beautiful data’

And if the table does not fit on one page, we have to repeat table heading and write this over heading:
‘continuation of table 1’
‘continuation of table 2’

for example if the table fit only on three pages, its look like this:

[page 1]
‘table 55 - very big table’
*table heading

*table content 1

[page 2]
‘continuation of table 55’
*table heading

*table content 2

[page 3]
‘continuation of table 55’
*table heading

*table content 3

the question is - how to break table exactly at the end of the pages, so that you can work with them as with different table objects and, accordingly, insert something (text) between them
p.s. I looked at the app ‘showcases’, which go with Aspose.Words.C++ library and a large table with several pages is a single object

keywords: heading, table header, table continuation, table enumeration

@vavp You can achieve this using LayoutCollector. You can use this class to detect where page breaks and split the table in parts. For example see the following code:
in.docx (19.4 KB) out.docx (16.3 KB)

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector collector = new LayoutCollector(doc);

// Get the table that needs to be stlit in parts.
Table table = doc.FirstSection.Body.Tables[0];

while (table != null)
{
    table = SplitTalbe(table, collector);
    collector.Clear();
    doc.UpdatePageLayout();
}

doc.Save(@"C:\Temp\out.docx");
private static Table SplitTalbe(Table table, LayoutCollector collector)
{
    int startPageIndex = collector.GetStartPageIndex(table.FirstRow);

    int breakIndex = -1;
    int firstDataRowIndex = -1;
    // Determine index of row where page breaks. And index of the first data row.
    for (int i = 1; i < table.Rows.Count; i++)
    {
        Row r = table.Rows[i];
        if (!r.RowFormat.HeadingFormat && firstDataRowIndex < 0)
            firstDataRowIndex = i;

        int rowPageIndex = collector.GetStartPageIndex(r);
        if (rowPageIndex > startPageIndex)
        {
            breakIndex = i;
            break;
        }
    }

    if (breakIndex > 0)
    {
        Table clone = (Table)table.Clone(true);

        // Insert paragraph and clone table after the main table.
        Paragraph para = new Paragraph(table.Document);
        para.AppendChild(new Run(table.Document, "Continuation of the table"));

        table.ParentNode.InsertAfter(para, table);
        para.ParentNode.InsertAfter(clone, para);

        // Remove content after the breaking row from the main table.
        while (table.Rows.Count > breakIndex)
            table.LastRow.Remove();

        // Remove rows before the breaking row from the clonned table.
        for (int i = 1; i < breakIndex; i++)
            clone.Rows.RemoveAt(firstDataRowIndex);

        return clone;
    }

    return null;
}

Thank you so much!

1 Like

Hi again! :slight_smile:

I thought for a long time why the code that you showed does not work for me in C ++. Then I did this experiment.

There is an input document in1.docx (12.9 KB)

This is C# and C++ code that should do the same thing

С++

int main()
{
    auto doc    = MakeObject<Document>(u"D:\\TMP\\in1.docx");
    auto layout = MakeObject<LayoutCollector>(doc);
    auto table  = System::DynamicCast<Table>(doc->GetChild(NodeType::Table, -1, true));
    for (int i = 1; i < table->get_Rows()->get_Count(); i++)
    {
        auto row = table->get_Rows()->idx_get(i);
        cerr << "rowFirstCell: " << row->get_Cells()->idx_get(0)->GetText()
             << "\tGSPI: " << layout->GetStartPageIndex(row)
             << "\tGEPI: " << layout->GetEndPageIndex(row) << endl;
    }
}

С#

var doc = new Document("D:\\TMP\\in1.docx");
var layout = new LayoutCollector(doc);
var table = (Table)doc.GetChild(NodeType.Table, -1, true);
for (int i = 1; i < table.Rows.Count; i++)
{
    var row = table.Rows[i];
    string str = String.Format("rowFirstCell: {0}\tGSPI: {1}\tGSPI: {2}",
        row.Cells[0].GetText(), layout.GetStartPageIndex(row).ToString(), layout.GetEndPageIndex(row).ToString());
    Console.WriteLine(str);
}
return;

Here is the console output I got from two different runs:
C++

rowFirstCell: 5 GSPI: 2 GEPI: 2
rowFirstCell: 7 GSPI: 2 GEPI: 2
rowFirstCell: 8 GSPI: 2 GEPI: 2
rowFirstCell: 3 GSPI: 2 GEPI: 2
rowFirstCell: 1 GSPI: 2 GEPI: 2
rowFirstCell: 2 GSPI: 2 GEPI: 2

C#

rowFirstCell: 5 GSPI: 1 GSPI: 1
rowFirstCell: 7 GSPI: 1 GSPI: 1
rowFirstCell: 8 GSPI: 2 GSPI: 2
rowFirstCell: 3 GSPI: 2 GSPI: 2
rowFirstCell: 1 GSPI: 2 GSPI: 2
rowFirstCell: 2 GSPI: 2 GSPI: 2

They are both wrong, and most importantly, they are different from each other (C++ C#)

I expected it to be

rowFirstCell: 5 GSPI: 1 GSPI: 1
rowFirstCell: 7 GSPI: 1 GSPI: 1
rowFirstCell: 8 GSPI: 1 GSPI: 1
rowFirstCell: 3 GSPI: 2 GSPI: 2
rowFirstCell: 1 GSPI: 2 GSPI: 2
rowFirstCell: 2 GSPI: 2 GSPI: 2

Please tell me where I’m wrong and what I’m doing wrong, thank you very much :pray:

@vavp I have checked your code on my side and it properly returns the following output:

rowFirstCell: 5 GSPI: 1 GEPI: 1
rowFirstCell: 7 GSPI: 1 GEPI: 1
rowFirstCell: 8 GSPI: 1 GEPI: 1
rowFirstCell: 3 GSPI: 2 GEPI: 2
rowFirstCell: 1 GSPI: 2 GEPI: 2
rowFirstCell: 2 GSPI: 2 GEPI: 2

Just like your expected output. In your case, I suspect, you are using Aspose.Words in evaluation mode. In this case Aspose.Words injects evaluation watermark and adds evaluation text at the beginning of your document that pushes your table to the next page. In this case I have only 2 rows on the first page.
Also, the problem might occur if Aspose.Words does not have access to the fonts used in your document, in this case fonts are substituted and layout might differ from the original. But if you run C# and C++ code in the same environment the result should be the same.

You checked both of C++ and C# runs?

Yes, it evaluation mode in C++ and C#, i get different result between this libraries on the same environment. Can you run this code on C++?

@vavp Yes, the result I have posted was produced by C++ version of Aspose.Words.
By the way, you can request a temporary 30-days free license to test Aspose.Words without evaluation version limitations.

@alexey.noskov thanks for the information!

I work for a company and I was given the task to choose a tool for automating work with Office Word.

I am looking for options, trying and testing to decide and use some kind of tool in my work. Now I am testing all the libraries that I found for those cases that are critical for our work.

Now I really like Aspose and I think in the end it will be him, but I need to sort out this Layout Collector and some other things (this is later) in order to correctly transfer things to the following pages

When I test all the scenarios for evaluation version, I will request a temporary license to double-check everything again and we will continue to buy :slight_smile:

And there are still problems with Layout Collector, I hope we can sort this out, with the following message I will try to give another example of incorrect work on my side

С++ splitTable function

System::SharedPtr<Table> splitTable(System::SharedPtr<Table> table, System::SharedPtr<LayoutCollector> collector)
{
    int startPageIndex = collector->GetStartPageIndex(table->get_FirstRow());

    int breakIndex        = -1;
    int firstDataRowIndex = -1;
    for (int i = 0; i < table->get_Rows()->get_Count(); i++)
    {
        System::SharedPtr<Row> r = table->get_Rows()->idx_get(i);
        if (!r->get_RowFormat()->get_HeadingFormat() && firstDataRowIndex < 0)
            firstDataRowIndex = i;

        int rowPageIndex = collector->GetStartPageIndex(r);
        if (rowPageIndex > startPageIndex)
        {
            breakIndex = i;
            break;
        }
    }

    if (breakIndex > 0)
    {
        System::SharedPtr<Table> clone = System::DynamicCast<Table>(table->Clone(true));

        System::SharedPtr<Paragraph> para = System::MakeObject<Paragraph>(table->get_Document());
        para->AppendChild(System::MakeObject<Run>(table->get_Document(), u"Continuation of the Table"));

        table->get_ParentNode()->InsertAfter(para, table);
        para->get_ParentNode()->InsertAfter(clone, para);

        // Remove content after the breaking row from the main table.
        while (table->get_Rows()->get_Count() > breakIndex)
            table->get_LastRow()->Remove();

        // Remove rows before the breaking row from the clonned table.
        for (int i = 1; i < breakIndex; i++)
            clone->get_Rows()->RemoveAt(firstDataRowIndex);

        return clone;
    }

    return System::SharedPtr<Table>();
};

C++ main function

int main()
{
    auto doc     = MakeObject<Document>();
    auto builder = MakeObject<DocumentBuilder>(doc);
    auto layout  = MakeObject<LayoutCollector>(doc);
    builder->get_PageSetup()->set_PaperSize(PaperSize::A4);
    builder->get_ParagraphFormat()->set_StyleIdentifier(StyleIdentifier::BodyText);
    builder->StartTable();
    for (int rows = 0; rows < 99; ++rows)
    {
        builder->InsertCell();
        stringstream ss;
        ss << "Text " << rows;
        builder->Write(System::String(ss.str()));
        builder->EndRow();
    }
    builder->EndTable();
    layout->Clear();
    doc->UpdatePageLayout();
    auto table = System::DynamicCast<Table>(doc->GetChild(NodeType::Table, -1, true));

    while (table != nullptr)
    {
        table = splitTable(table, layout);
        layout->Clear();
        doc->UpdatePageLayout();
    }

    doc->Save(u"D:\\TMP\\out1.docx");
}

function splitTable is adaptation of function, that you send me in this post, but rewrite in C++.

I use the Aspose.Words.Cpp 22.5 Windows
OS Windows 10
I have Word 2016 installed

When i run this code and open the Document i get this:
out1.docx (18.2 KB)
image.png (48.1 KB)

If I manually put the page breaks where they should be (and with the maximum possible filling of the page with a table), then I get the following:
image.png (37.7 KB)

It’s as if Aspose thinks that the sheet is smaller in size than it actually is and rips the page inside its model before it actually happens.

Are you also doing well or are you working correctly? what can it depend on that it turns out differently on different computers? Can it depend on the environment, on the version of Microsoft office word? It may be necessary to set some settings programmatically so that Aspose has a correct idea of the actual size of the A4 sheet.

I think this is directly related to the fact that the Layout Collector on my side misunderstands where a new page starts.

Please help to make it right :pray: :pray:

@vavp I have checked your code on my side and page breaks are inserted properly. Please see the output document produced on my side using your code: out1.docx (7.8 KB)
Aspose.Words does not depend on MS Word and does not required it to be installed at all. So this cannot be the reason of the problem.
The only possible reason of the problem can be different fonts used in Aspose.Words. While building document layout Aspose.Words uses fonts to calculate glyphs size and position, if the fonts used in the document are not available Aspose.Words substitutes fonts and this might lean to layout differences.
Could you please save the output as PDF or XPS in your code and attach the results here. This will allow to identify what fonts Aspose.Words actually uses upon building document layout.

yes, in pdf its looking good! out1.pdf (26.4 KB)

But strange things… i run this code:

int main()
{
    auto doc     = MakeObject<Document>();
    auto builder = MakeObject<DocumentBuilder>(doc);
    auto layout  = MakeObject<LayoutCollector>(doc);
    cerr << "FONT: " << builder->get_Font()->get_Name() << endl;
    cerr << "SIZE: " << builder->get_Font()->get_Size() << endl;
    cerr << "StyleName: " << builder->get_Font()->get_StyleName() << endl;
    return 0;
}

get this:

FONT: Times New Roman
SIZE: 12
StyleName: Default Paragraph Font

And when i open my out1.docx its same image.png (6.2 KB)

It turns out that Times New Roman in Aspose is not the same Times New Roman that I see in the document?

i am probably need to find which font my docx refers to and somehow make Aspose fonts friends with mine. I will read the article you gave, I hope I find the answer there!

p.s. i tried Arial, Courier, Times New Roman and get the same result:
in view of document there is an incorrect Layout work, and in PDF its correct. And no matter what font I put in the document, in PDF it seems to be the same…

tried to

auto fontSettings = System::MakeObject<FontSettings>()->get_DefaultInstance();
fontSettings->SetFontsFolder(u"C:\\Windows\\Fonts", true);

get same result

I solved problem. Download exactly same fonts with only english file names, etc… and this work

there is Cyrillic in the name of my font files, perhaps this confused the Aspose library image.png (12.9 KB)

We can make sense for everyone from our conversation - if make a request so that if the font analyzer in Aspose can extract information about fonts and work with files that can have a variety of names (from Unicode, for example)

Thanks!

@vavp Thank you for additional information. It is perfect that you managed to make it wok on your side. I have checked your output PDF document and as I can see Fanwood font is used for rendering:

Fanwood font is the last resort font used by Aspose.Words when no fonts can be located. This font is embedded into Aspose.Words library itself. So my first assumption that fonts are substituted upon layout process was right.
However, I am not sure why C++ version on your side cannot locate system fonts in your environment. Russian glyphs in file name should not be the problem since font family and other information is read from the font file itself. Could you please run the following code on your side and make sure Aspose.Words can locate any fonts on your side:

FontSourceBase[] sources = FontSettings.DefaultInstance.GetFontsSources();
foreach (FontSourceBase fs in sources)
{
    foreach (PhysicalFontInfo info in fs.GetAvailableFonts())
    {
        Console.WriteLine("{0} - {1}", info.FullFontName, info.FilePath);
    }
}

Hi!

There are no fonts! :slight_smile:
Code1:

int main()
{
    auto fontSettings = System::MakeObject<FontSettings>()->get_DefaultInstance();

    auto sources = fontSettings->GetFontsSources();
    for (auto i : sources)
    {
        for (auto info : i->GetAvailableFonts())
        {
            cerr << info->get_FullFontName() << " - " << info->get_FilePath() << endl;
        }
    }
    return 0;
}

Output1:image.png (3.1 KB)

And if i add this string (with downloaded windows default fonts)

fontSettings->SetFontsFolder(u"D:/download/Windows10DefaultFonts", true);

Then output2: image.png (50.2 KB)

So, Aspose cannot work with my Russian windows and resolve my Fonts at C:/Windows/Fonts image.png (58.1 KB)

I solved problem by download fonts, but this info could be useful for improvements Aspose Library

Thanks a lot!

@vavp Thank you for additional information. Unfortunately, I cannot reproduce the mentioned problem and fonts are properly resolved in Russian windows too. So there must be something other wrong. Probably permissions issue. Could you please check whether the application has permissions to read fonts from C:/Windows/Fonts folder?

how i can check permissions? if i do this one:

fontSettings->SetFontsFolder(u"C:/Windows/Fonts", true);

its not working. just like default variant - without fonts

@vavp You can check whether you can read files from the Windows fonts folder.

manually in my computer i can read this fonts and open it

@vavp I mean read files from Windows fonts folder programmatically.
As another test, you can copy all the fonts from the Windows fonts folder into another folder like "C:\Temp\fonts" and specify it as fonts source folder in FontSettings. I believe in this case fonts will be handled properly.

Hi! Sorry for the wait

Yes, if i copy fonts from C:/Windows/Fonts to another folder it works!

1 Like