Build TOC only for primary locale or language and ignore rest heading, subheadings

Hello Team,

I have Aspose.Words v21.10.0 and running the API project in .Net Core 6.0, I’m unable to find any help or suggestions as how to update the page numbers for TOC when removed all other headings/sub headings other than primary-Language or Locale identified.
Here is the attached existing TOC generated considering all the languages in the TOC with correct page numbers
ex:
SECTION (ENGLISH)...... 4
SECTION (FRENCH)...... 4
SECTION (SPANISH)..... 4

expected RESULT:
SECTION (ENGLISH)...... 4
QUESTION (ENGLISH)...... 4

Whole template is generated based on the json values which contains list of collection in multiple languages, once the TOC inserts then it will create Headings for each language with the correct page numbers, once I remove all the toc fields from the toc other than primary language and remove unwanted spaces from the page, then the toc page numbers reflected in the toc is not correct.

Here is my code

DocumentBuilder builder = new DocumentBuilder(doc);
LayoutCollector lc = new LayoutCollector(doc);
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (lc.GetStartPageIndex(para) == 2) // insert toc after 1st page
    {
        if (para.PreviousPreOrder(doc).NodeType == NodeType.Run)
        {
            builder.MoveTo(para.PreviousPreOrder(doc).ParentNode);
            builder.InsertTableOfContents("o \"1-3\" \\h \\z \\u");
            builder.InsertBreak(BreakType.PageBreak);
            break;
        }
    }
}
doc.UpdateFields();

// get toc only for primaryLanguage of territory
var primaryLanguage = territoryLanguages?.FirstOrDefault(x => x.IsPrimary)?.Name;
foreach (var eachField in doc.Range.Fields.Where(x => (int)x.Type == 88))
{
    if ((int)eachField.Type == 88 && eachField.DisplayResult.Contains("Appendix"))
    {
        break;
    }
    else if ((int)eachField.Type == 88 && !string.IsNullOrEmpty(primaryLanguage) && !eachField.DisplayResult.Contains(primaryLanguage))
    {
        eachField.Remove();
    }
}
doc = RemoveUnwantedSpacesFromPage(doc, primaryLanguage);// removes unwanted spaces after deleting the each Field.

doc.UpdatePageLayout();
doc.Styles[StyleIdentifier.Toc9].ListFormat.ApplyNumberDefault();
return doc;

So how to update toc with correct page numbers after removing each fields other than primary language? Or please help me to build TOC only for the primary language with the page numbers?

Please note: the entire collection is dynamic and it will not contain heading/sub headings as Section or Question - so only with the Primary language I want to control creating the TOC.
ExistingTOCWithMultiLanguage.png (16.2 KB)
ExpectedTOCwithPrimaryLang.png (9.4 KB)

@sudi065, could you try calling doc.UpdateFields() and doc.UpdatePageLayout() before the return doc statement in your code sample as shown below:

...snipped

doc.Styles[StyleIdentifier.Toc9].ListFormat.ApplyNumberDefault();

doc.UpdateFields();
doc.UpdatePageLayout;

return doc;

https://docs.aspose.com/words/net/working-with-table-of-contents/#updating-the-table-of-contents

If that does not help, then could you please attach the document for the analysis?

@dshvydkiy - After adding suggested code I’m getting different toc which is still wrong,

   TABLE OF CONTENTS	2
INTRODUCTION (ENGLISH)	3
INTRODUCTION (FRENCH)	3
INTRODUCTION (SPANISH)	3
COPY LINK LIBRARY SECTION DETAILS (ENGLISH)	4

Please find attached when I have simple TOC without removing any of the Fields and another document to show what I really expect removing other than primary language from the TOC.ExistingTOCWithAllLanguagesCorrectPageNo.docx (44.0 KB)
ExpectedDocumentWithCorrectPageNo.docx (44.9 KB)

@sudi065, I have tried running the code you provided with Aspose.Words for .NET 21.10. The only difference is that I have commented out the call to RemoveUnwantedSpacesFromPage, but that should not affect any page numbers. As you can see in the attached document, the TOC is as expected:
ExistingTOCWithEnglish.aw.21.10.docx (37.1 KB)

Could you please run the code you provided on your computer, save the output to DOCX and PDF and attach the output files here?

@dshvydkiy- This time I have attached different .docx to explain the issue in better way, which contains all the other languages which will extend the TOC to 2nd page, once the space or paragraph is removed wherever fields exists then the actual TOC will never gets updated with correct page numbers. Please find attached the ExistingDocumentWithCorrectPageNo.docx which shows all the page number correct corresponding to Language and its headings.

After this code

doc = RemoveUnwantedSpacesFromPage(doc, primaryLanguage);

doc.Styles[StyleIdentifier.Toc9].ListFormat.ApplyNumberDefault();

doc.UpdatePageLayout();

return doc;

and,

private Document RemoveUnwantedSpacesFromPage(Document doc, string primaryLanguage)
{
    var flagToBreakSection = false;
    LayoutCollector lc = new LayoutCollector(doc);
    foreach (Aspose.Words.Section sec in doc.Sections)
    {
        if (flagToBreakSection)
        {
            break;
        }
        NodeCollection bodyParas = sec.Body.GetChildNodes(NodeType.Paragraph, true);
        foreach (Paragraph para in bodyParas)
        {
            if (lc.GetStartPageIndex(para) > 1)
            {
                if (para.Range.Text.Trim().Equals("Introduction (" + primaryLanguage + ")"))
                {
                    flagToBreakSection = true;
                    break;
                }

                if (string.IsNullOrEmpty(para.GetText().Trim()))
                    para.Remove();

                if (sec.ToString(SaveFormat.Text).Trim() == String.Empty)
                    sec.Remove();
            }
        }
    }
    return doc;
}

the ActualDocumentGerenatedWithWRONGPageNo.docx and ActualDocumentGerenatedWithWRONGPageNo.pdf(as requested) attached for your reference.

So the issue here in the ActualDocumentGerenatedWithWRONGPageNo.docx after removing any space or paragraph or empty page then the actual page which shows are wrong compared to the ExistingDocumentWithCorrectPageNo.docx

INTRODUCTION (ENGLISH)	4
SECTION NAME SPRINT 48 DEMO (ENGLISH)	8
SUMMARY (ENGLISH)	12 -- Summary (English) is in 10th page.
APPENDIX	17 -- when there is no page at 17 

ExistingDocumentWithCorrectPageNo.docx (41.8 KB)
ActualDocumentGerenatedWithWRONGPageNo.docx (37.0 KB)
ActualDocumentGerenatedWithWRONGPageNo.pdf (86.0 KB)

@sudi065 Thank you for additional information. But unfortunately, I cannot reproduce the problem on my side using the code you have provided.
Also, in your case you can use code like the following to get the required output:

Document doc = new Document(@"C:\Temp\in.docx");

string primaryLanguage = "English";

// Get all heading paragraphs
List<Paragraph> headings = doc.GetChildNodes(NodeType.Paragraph, true)
    .Cast<Paragraph>().Where(p => p.ParagraphFormat.OutlineLevel != OutlineLevel.BodyText).ToList();

// Reset outline level of heading paragraphs which are not in primary language.
// Paragraphs with OutlineLevel = OutlineLevel.BodyText will not be included into the TOC.
foreach (Paragraph p in headings)
{
    string headingText = p.ToString(SaveFormat.Text).Trim();

    if (!headingText.EndsWith(string.Format("({0})", primaryLanguage)) &&
        headingText.EndsWith(")"))
        p.ParagraphFormat.OutlineLevel = OutlineLevel.BodyText;
}

doc.UpdateFields();
doc.UpdatePageLayout();

doc.Save(@"C:\Temp\out.docx");

The idea is that paragraphs with OutlineLevel = OutlineLevel.BodyText are not included into the TOC.

Thank you very much @alexey.noskov, after adding the suggested changes, it worked pretty well.

1 Like

@alexey.noskov I’m really not sure what is happening once the code gets pushed to our Dev Environment. When I tested locally in my Win 10 OS and with

version Aspose.Words v21.10.0 and running the API project in .Net Core 6.0 everything works perfect, but once the same code gets deployed to Dev Environment then the page numbers are all incorrect.

Please find attached .docx one which was executed locally and same .docx when it gets deployed to Dev.
GeneratedByRunningLocallyWin10OS.docx (117.4 KB)
GeneratedWhenCodeGetsPushedToDEVEnv.docx (119.4 KB)

Plz find attached my docker file and below details of the .csproj

<PackageReference Include="SkiaSharp" Version="2.80.3" />
<PackageReference Include="SkiaSharp.NativeAssets.Linux.NoDependencies" Version="2.80.3" />

@sudi065 Most likely the problem occurs, because the fonts used in your original document are not available in the environment where document processing is performed. To updated TOC page numbers Aspose.Words need to build document layout and fonts are required for this. If Aspose.Words cannot find the font used in the document, the font is substituted. This might lead into document layout differences and as a result incorrect page numbers in the TOC. You can implement IWarningCallback to get notifications when font substitution is performed.

@alexey.noskov Thank you for your help but this issue still persists. We ruled out that is a font issue based on an another thread in this forum(Aspose.word table of contents page number problem). The issue seems to happen only when deployed to a Linux container.

We updated version of Aspose.Words to 22.10 with below code:

Document document = new Document(Path.Combine(FILES_DIRECTORY_TO_WORD, $"{eventName}.docx"));
foreach (Field field in document.Range.Fields)
{
    if (field.Type == FieldType.FieldTOC)
    {
        FieldToc fieldToc = (FieldToc)field;
        fieldToc.UpdatePageNumbers();
    }
}

We also changed

<PackageReference Include="SkiaSharp" Version="2.88.3" />

and

<PackageReference Include="SkiaSharp.NativeAssets.Linux.NoDependencies" Version="2.88.3" />

But it still not updating the page number in TOC on a Linux container.

@mahsrinov Most likely, the Linux container does not have the fonts used in your source document and the fonts are substituted (see my previous answer). This leads to inaccurate document layout and incorrect page numbers in the TOC.
Please implement IWarningCallback to get notifications when font substitution is performed.