TOC lists heading twice

We are creating a PDF document from scratch. The headings we are using always have the property “IsKeptWithNext” set, so that they are not written on the bottom of a page without consecutive paragraph.

All headings are listed in a TOC.

If a heading would fit on the bottom of a page, but Aspose must move it to the next page, because the “IsKeptWithNext” condition would be broken, the heading is listed twice in the TOC.

Please see the following code sequence to reproduce the problem.

Thanks for your assistance!

    Aspose.Pdf.License license = new Aspose.Pdf.License();
    // Instantiate license file
    license.SetLicense("LGSolutions.Finance.Portal.Aspose.Pdf.lic");
    // Set the value to indicate that license will be embedded in the application
    license.Embedded = true;

    var pdf = new Aspose.Pdf.Document();

    Page TocSection = pdf.Pages.Add();
      
    TocInfo tocInfo = new TocInfo();
    TocSection.TocInfo = tocInfo;
    tocInfo.FormatArrayLength = 1;
    tocInfo.FormatArray[0].TextState.Font = Aspose.Pdf.Text.FontRepository.FindFont("Verdana");
    tocInfo.FormatArray[0].TextState.FontSize = 20;

    Page page = pdf.Pages.Add();

    Aspose.Pdf.Text.TextFragment entry1 = new Aspose.Pdf.Text.TextFragment(@"Entry 1
      dfsgdfsgdfs
      gdfsg
      dfsg
      gadag
      dsagsdag
      dfsagdafsgadfghad dfg
      df
      gdfs
      g
      dfg
      dfsgdfsgdfsgdsfgdfsg
    ");
    Aspose.Pdf.Text.TextFragment entry2 = new Aspose.Pdf.Text.TextFragment(@"Entry 2
      dfsgdfsgdfs
      gdfsg
      dfsg
      dfgdfg
      df
      gdfs
      g
      dfg
      dfsgdfsgdfsgdsfgdfsg
    ");
    Aspose.Pdf.Text.TextFragment entry3 = new Aspose.Pdf.Text.TextFragment("Entry 3");
    Aspose.Pdf.Text.TextFragment entry4 = new Aspose.Pdf.Text.TextFragment("Entry 4");
    Aspose.Pdf.Text.TextFragment entry5 = new Aspose.Pdf.Text.TextFragment("Entry 5");

    var heading = new Heading(1) { Text = "Header 1" };
    heading.IsInList = true;
    heading.IsKeptWithNext = true;
    heading.TocPage = TocSection;
    page.Paragraphs.Add(heading);

    page.Paragraphs.Add(entry1);
    page.Paragraphs.Add(entry2);
    page.Paragraphs.Add(entry3);

    var text = new Aspose.Pdf.Text.TextFragment(@"Some text dsfgdfg
      dfsgdfsgdfs
      gdfsg
      dfsg
      dvdsagsadgdsg
      sdag
      sdg
      sadg
      sad
      gasdgadsfgdfagdfa
      gdaf
      gadf

      gadfgdf
      g
      df
      ag


      df
      gadfg
      df
      gadfgadfg
      adf
      g
      adfg

      dfag
      adf
      g
      adf
      g
      dfgdfg
      sadfdsf
      sdafsdaf
      dfgdfg
      dfg
      dsfgdfsgdfsgdfs
      gdfgdfsgdfg
      gdfsgdf

    ");
    page.Paragraphs.Add(text);

    heading = new Heading(1) { Text = "Header 2" };
    heading.IsInList = true;
    heading.IsKeptWithNext = true;
    heading.TocPage = TocSection;
    page.Paragraphs.Add(heading);

    page.Paragraphs.Add(entry4);
    page.Paragraphs.Add(entry5);

    try
    {
      var memoryStream = new MemoryStream();
      pdf.Save(memoryStream);

      memoryStream.CopyTo(this.Response.OutputStream);
    }
    catch (Exception ex)
    {
    }

    Response.ContentType = "application/pdf";
    Response.AppendHeader("Content-Disposition", "inline; filename=Test");

    return new EmptyResult();

@cool_aspose_4711

Thank you for contacting support.

I have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFNET-44017 has been logged in our issue management system for further investigation and resolution. The issue ID has been linked with this thread so that you will receive notification as soon as the issue is resolved.
We are sorry for the inconvenience.

We have to solve this and some other TOC issues urgently. So we decided to search for these entries when the TOC is generated and to remove them from the paragraphs collection in the TOC. This works, when the TOC size doesn’t exceed 1 page.

If the TOC contains more than 1 page, then OnBeforePageGenerate is called more than once (of course). We receive the paragraphs collection for the TOC that we prepared (the duplicate items are removed correctly). But they are still rendered to the TOC.

Could you please explain which collection is used internally when the TOC is generated? What happens, if the TOC doesn’t fit onto 1 page. Why do I get a different paragraphs collection than is used internally? And why does it work, if the TOC doesn’t exceed 1 page?

We already invested a lot of time.

Also generation of TOC sometimes crashes when it exceeds a page. I didn’t find out exactly when this happens, but I know that it has to do with page breaks in the TOC. It looks like Aspose crashes, when the last heading it wants to write on a page doesn’t fit (of course this is an assumption, because I don’t know how it is implemented).

It crashes somewhere in the depths of Aspose with the following exception:

System.IndexOutOfRangeException
Invalid index: index should be in the range [1…n] where n equals to the text fragments count.

StackTrace:
at Aspose.Pdf.Text.TextFragmentCollection.get_Item(Int32 index)
at Aspose.Pdf.Text.TextParagraph.()
at Aspose.Pdf.Heading.(TextParagraph , TextState , Double , Page , Single , Int32& )
at Aspose.Pdf.Heading.(Double , Double , Page , Double , Double , MarginInfo , Rectangle )
at Aspose.Pdf.Heading.(Double , Double& , Rectangle , MarginInfo , Double , Double , Boolean , Page )
at ​ .(BaseParagraph )
at ​ .()
at Aspose.Pdf.Page.(Page )
at Aspose.Pdf.Document.(List`1 )
at Aspose.Pdf.Document.ProcessParagraphs()

Your assistance is very welcome!

Of course we have a license, if I should contact you otherwise, please let me know.

@cool_aspose_4711

Thank you for sharing more information.

I have updated the information shared by you in our issue management system. However, please share a code snippet reproducing IndexOutOfRangeException so that we may investigate it further.

Are you referring to Paid Support? If you are subscribed to it then you may request over Paid Support Helpdesk to raise the priority of PDFNET-44017 because the issues logged with Paid Support are investigated and resolved on priority basis.

Sorry, I cannot provide a code sample. I was hoping that you know that such an issue can occur. We are producing our documents based on information getting from a database, which is quite complicated and for some reports this error happens. If removing or adding a single headline or simply adding a page break in the TOC, the error sometimes is gone or starts to appear.

As stated: I don’t know exactly when it happens, that’s why I cannot extract relevant information to a piece of code.

Any information would be helpful!

@cool_aspose_4711

The information shared by you has been updated in relevant ticket and we will share our findings with you as soon as our product team investigates it. We appreciate your patience in this regard.

This issue is still not solved.

We booked payed support and sent a lot of samples to the support team. It looks like there are severe issues on creating a valid document in respect of page breaks. This is not limited to the TOC.

After more than half a year of sending code samples again and again, we gave up to try to get a fix. It seems the support team doesn’t understand the reason of the problem and is not able to solve it. So we decided to solve the problems on our own. We implemented tricky workarounds for all issues (and believe me - there are some severe issues like destroyed table headers on page break, etc.).

Especially for the TOC this means:

*) Headings are still missing on the lowest level of the TOC. The trick: We introduce an additional level to the TOC and make it invisible by setting the font size to 0.1. Fine - all headings are shown.
*) Headings show up twice in the TOC when they are located exactly on a page break in the document and moved to the next page. We check the TOC and make the size of the duplicate headings also font size 0.1

The workarounds are necessary, because the paragraphs collection that is passed to OnBeforePageGenerate is not the one that Aspose uses internally. So removing or adding elements to the paragraphs collection doesn’t change anything - it is simply a COPY!

But these days we encountered an additional, unsolveable problem in this context: by removing the duplicate headers manually, we get wrong page numbers in the TOC. This happens exactly when removing a heading from the TOC causes the TOC to be one page less.

Means: The page numbers are calculated BEFORE the TOC is passed to OnBeforePageGenerate. That’s bad.

Now the questions come:
Is there a possibility to manipulate the paragraphs collection that is used by Aspose internally to create the TOC? We REALLY need to remove the duplicate headings, they are simply WRONG.

How is the workflow of creating the TOC established by Aspose? When are the page numbers created for the TOC? Do I have a chance to force it to recalculate them?

Any help is appreciated.

Thanks for your assistance!

@cool_aspose_4711

Thank you for getting back to us.

We are looking into it and will share our feedback with you soon.

@cool_aspose_4711

We have communicated your concerns to Paid Support team. Please follow up on respective ticket.