Selective download document based on table of content

Rajeshraj1983 · January 5, 2018, 2:23pm

Hi,
I have a large document having many sections.
Please help me to selective download of sections based on table of content.

tahir.manzoor · January 5, 2018, 3:21pm

@Rajeshraj1983,

Thanks for your inquiry. Please use the following code example to extract the content from the document based on table of content. Please get the code of ExtractContent and GenerateDocument methods from following article.
How to Extract Selected Content Between Nodes in a Document

If you are using Aspose.Words for Cloud, please post your query in Aspose.Words for Cloud forum.

Document doc = new Document(MyDir + "in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.MoveToDocumentEnd();
builder.StartBookmark("_TocEnd");
builder.EndBookmark("_TocEnd");

NodeCollection nodes = doc.GetChildNodes(NodeType.FieldStart, true);

// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();

foreach (FieldStart fstart in nodes)
{
    if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
    {
        String fieldText = fstart.GetField().GetFieldCode();

        if (fieldText.Contains("_Toc"))
        {
            fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length).Replace("\\h", "").Trim();
            tocitems.Add(fieldText);
        }
    }
}

for (int i = 0; i < tocitems.Count - 1; i++)
{
    BookmarkStart bookmarkStart =
    doc.Range.Bookmarks[tocitems[i].ToString()].BookmarkStart;
    BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;

    // Firstly extract the content between these nodes including the bookmark.
    ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(bookmarkStart, bookmarkEnd, false);

    Document doc2 = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
    doc2.Save(MyDir + tocitems[i] + "Out.docx");
}

Rajeshraj1983 · January 8, 2018, 12:18pm

Hi tahir
Thanks for your response.
I am getting error on this line
if (fieldText.Contains("_Toc"))
{
fieldText = fieldText.Substring(fieldText.IndexOf("_Toc"), fieldText.Length).Replace("\h", “”).Trim();
tocitems.Add(fieldText);
}
as “Index and length must refer to a location within the string”.

tahir.manzoor · January 8, 2018, 2:16pm

@Rajeshraj1983,

Please accept my apologies for your inconvenience. Please use the following modified code example. Hope this helps you. If you still face problem, please ZIP and attach your input Word document here for our reference. We will then provide you more information on this along with code.

Document doc = new Document(MyDir + "in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.MoveToDocumentEnd();
builder.StartBookmark("_TocEnd");
builder.EndBookmark("_TocEnd");

NodeCollection nodes = doc.GetChildNodes(NodeType.FieldStart, true);

// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();

foreach (FieldStart fstart in nodes)
{
    if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
    {
        String fieldText = fstart.GetField().GetFieldCode();

        if (fieldText.Contains("_Toc"))
        {
            fieldText = fieldText.Replace("PAGEREF", "").Replace("\\h", "").Trim();
            tocitems.Add(fieldText);
        }
    }
}
tocitems.Add("_TocEnd");
for (int i = 0; i < tocitems.Count - 1; i++)
{
    BookmarkStart bookmarkStart =
    doc.Range.Bookmarks[tocitems[i].ToString()].BookmarkStart;
    BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;

    // Firstly extract the content between these nodes including the bookmark.
    ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(bookmarkStart, bookmarkEnd, false);

    Document doc2 = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
    doc2.Save(MyDir + tocitems[i] + "Out.docx");
}

Rajeshraj1983 · January 8, 2018, 3:00pm

I got another error now.
BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;

Only first 3 TOC splitted with first page as blank before that error

tahir.manzoor · January 8, 2018, 3:10pm

@Rajeshraj1983,

Thanks for your inquiry. Please ZIP and attach your input Word document here for testing. We will investigate the issue on our side and provide you more information.

Rajeshraj1983 · January 9, 2018, 3:42pm

Hi.
In actual file I got another error .
BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;
Object reference not set to an instance of an object.

Please find the sample file
testfile.zip (523.4 KB)
Also header and footer is missing in extracted documents.

Please find the below requirement

I have a large file which is having many sections which is spanned over different pages. It is my base document.
There may be Table of Content for this document.
Need to display table of Content in a web interface and provide check boxes for selection
Based on the selection of above check boxes,end user needs to save or download selective section with exact look ( header ,footer,font,margin,spacing,images etc)

tahir.manzoor · January 9, 2018, 5:33pm

@Rajeshraj1983,

Thanks for sharing the document. We have tested the scenario using latest version of Aspose.Words for .NET 18.1 and have not found the shared issue. Please use Aspose.Words for .NET 18.1. We have attached the output documents with this post for your kind reference.

Please call Document.UpdateFields method after loading the document. This method updates the table of content field in the document. Hope this helps you.

Please use the following method to import the header/footer from the input document to the target document.

public static void MergeHeaderFooter(Document srcDoc, Document dstDoc, HeaderFooterType headerType)
{
    foreach (Section section in dstDoc.Sections)
    {
        HeaderFooter header = section.HeadersFooters[headerType];
        if (header == null)
        {
            // There is no header of the specified type in the current section, create it.
            header = new HeaderFooter(section.Document, headerType);
            section.HeadersFooters.Add(header);
        }

        foreach (Node srcNode in srcDoc.FirstSection.HeadersFooters[headerType].ChildNodes)
        {
            Node dstNode = dstDoc.ImportNode(srcNode, true, ImportFormatMode.KeepSourceFormatting);
            header.AppendChild(dstNode);
        }
    }
}

Rajeshraj1983 · January 10, 2018, 8:24am

Hi Tahir,
Thanks for the response.
Could you please share the code where to call MergeHeaderFooter method?
Also the background images for Section2, Section3,Section 4 etc are missing in first page in output document
First page header of each section is not rendering properly.(Images, and style missing)
Also please share the ouput documents for reference

tahir.manzoor · January 10, 2018, 11:41am

@Rajeshraj1983,

Thanks for your inquiry. Please use the following code example to extract the TOC contents along with header and footer. Hope this helps you.

DocumentBuilder builder = new DocumentBuilder(doc);

builder.MoveToDocumentEnd();
builder.StartBookmark("_TocEnd");
builder.EndBookmark("_TocEnd");

NodeCollection nodes = doc.GetChildNodes(NodeType.FieldStart, true);

// Get list of bookmarks listed in TOC
ArrayList tocitems = new ArrayList();

foreach (FieldStart fstart in nodes)
{
    if (fstart.FieldType == Aspose.Words.Fields.FieldType.FieldPageRef)
    {
        String fieldText = fstart.GetField().GetFieldCode();

        if (fieldText.Contains("_Toc"))
        {
            fieldText = fieldText.Replace("PAGEREF", "").Replace("\\h", "").Trim();
            tocitems.Add(fieldText);
        }
    }
}
tocitems.Add("_TocEnd");
for (int i = 0; i < tocitems.Count - 1; i++)
{
    BookmarkStart bookmarkStart =
    doc.Range.Bookmarks[tocitems[i].ToString()].BookmarkStart;
    BookmarkStart bookmarkEnd = doc.Range.Bookmarks[tocitems[i + 1].ToString()].BookmarkStart;

    // Firstly extract the content between these nodes including the bookmark.
    ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(bookmarkStart, bookmarkEnd, false);

    Document dstDoc = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
    dstDoc.FirstSection.PageSetup.DifferentFirstPageHeaderFooter = ((Section)bookmarkStart.GetAncestor(NodeType.Section)).PageSetup.DifferentFirstPageHeaderFooter;

    AddHeaderFooter((Section)bookmarkStart.GetAncestor(NodeType.Section), dstDoc, HeaderFooterType.HeaderFirst);
    AddHeaderFooter((Section)bookmarkStart.GetAncestor(NodeType.Section), dstDoc, HeaderFooterType.FooterFirst);
    AddHeaderFooter((Section)bookmarkStart.GetAncestor(NodeType.Section), dstDoc, HeaderFooterType.HeaderPrimary);
    AddHeaderFooter((Section)bookmarkStart.GetAncestor(NodeType.Section), dstDoc, HeaderFooterType.FooterPrimary);
                    
    dstDoc.Save(MyDir + tocitems[i] + "Out.docx");
}

public static void AddHeaderFooter(Section section, Document dstDoc, HeaderFooterType headerType)
{
    NodeImporter imp = new NodeImporter(section.Document, dstDoc, ImportFormatMode.KeepSourceFormatting);
         
    HeaderFooter header = dstDoc.FirstSection.HeadersFooters[headerType];
    if (header == null)
    {
        // There is no header of the specified type in the current section, create it.
        header = new HeaderFooter(dstDoc, headerType);
        dstDoc.FirstSection.HeadersFooters.Add(header);
    }

    //copy the header/footer content
    if (section.HeadersFooters[headerType] != null && section.HeadersFooters[headerType].HasChildNodes)
        foreach (Node srcNode in section.HeadersFooters[headerType].ChildNodes)
        {
            Node impNode = imp.ImportNode(srcNode, true);
            header.AppendChild(impNode);
        }
    //if header/footer is linked to corresponding header/footer in the previous section,
    // copy the content from previous section
    else
    {
        int count = section.Document.IndexOf(section);
        for (int i = 0; i < count; i++)
        {
            section = (Section)section.PreviousSibling;
            if (section.HeadersFooters[headerType] != null && section.HeadersFooters[headerType].IsLinkedToPrevious == false)
            {
                foreach (Node srcNode in section.HeadersFooters[headerType].ChildNodes)
                {
                    Node impNode = imp.ImportNode(srcNode, true);
                    header.AppendChild(impNode);
                }
                break;
            }
        }
    }
}

Rajeshraj1983 · January 10, 2018, 4:00pm

Thank you so much…It works for me like a charm !
But for one sample document First page hedaer is not working properly.
Please find the attached document.
testfile1.zip (1004.2 KB)

tahir.manzoor · January 11, 2018, 4:38am

@Rajeshraj1983,

Thanks for your inquiry. We are working over the shared scenario and will get back to you soon.

tahir.manzoor · January 11, 2018, 10:45am

@Rajeshraj1983,

Thanks for your inquiry. Your document contains the following field codes for table of content field.

{ TOC \h \z \t "Heading 1,1,Divider - Main Heading,1,Divider 2 - Subhead,2,Divider 2 - Appendix,1 }

When TOC field is updated, hidden bookmarks are created with name starts with “_TOC”. Please check the attached DOM image. DOM.png (16.9 KB)

The code example shared in my previous post extracts the contents between these bookmarks. In your document (testfile1.docx), the text box contains two paragraphs and these paragraph belongs to different TOC items. In this case, we suggest you following solution.

Please check if the ancestor of bookmark that starts with “_Toc” is Shape node.
Clone the Paragraph nodes of Shape.
Insert them after the parent paragraph of Shape node.
Remove the Shape node.
Use the same code to extract the contents.

Hope this helps you. Please let us know if you have any more queries.

Rajeshraj1983 · January 11, 2018, 11:10am

Hi Tahir,
Thanks for the reply.We will modify TOC accordingly.
Could you please share best practice code to merge all split documents as one document that must be replica of original document.

tahir.manzoor · January 11, 2018, 4:34pm

@Rajeshraj1983,

Thanks for your inquiry. Please read the following article about joining document. Hope this helps you.
Joining and Appending Documents

Rajeshraj1983 · January 15, 2018, 3:47pm

Hi Tahir,
Thanks for the response.
Could you please explain , how to fetch table of content from a document to place check box for selective download?

tahir.manzoor · January 15, 2018, 5:34pm

@Rajeshraj1983,

Thanks for your inquiry. Aspose.Words for .NET is just a class library and with it you can programmatically generate, modify, convert, render and print documents without utilizing Microsoft Word®. So, it does not offer any UI to perform document processing tasks.

Could you please share more more detail about this query? We will then provide you more information on this.

Rajeshraj1983 · January 16, 2018, 12:42am

Hi Thahir,
Our requirement is to fetch table of content page from orginal document.
Could you please provide a solution for that

tahir.manzoor · January 16, 2018, 9:50am

@Rajeshraj1983,

Thanks for your inquiry. Please use LayoutCollector.GetStartPageIndex method to gets 1-based index of the page where node begins.

Document doc = new Document(MyDir + "in.docx");
doc.UpdateFields();
LayoutCollector collector = new LayoutCollector(doc);

Field field = doc.Range.Fields.Cast<Field>().Where(f => f.Type == FieldType.FieldTOC).ToList()[0];
Console.WriteLine(collector.GetStartPageIndex(field.Start));

If you want to read the TOC items’ content along with page number, you can use following code example. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
DataTable tocTable = TableOfContentsToDataTable(doc);

foreach (DataRow row in tocTable.Rows)
{
    Console.WriteLine(string.Format("Entry name: {0}, Heading Level: {1}, Page number: {2}", row["EntryName"], ((Style)row["EntryStyle"]).StyleIdentifier, row["Page"]));
}

public static DataTable TableOfContentsToDataTable(Document doc)
{
    DataTable table = new DataTable();

    table.TableName = "Toc";
    //******* Needed for Aspose's code
    table.Columns.Add("EntryRef");
    //****** end
    table.Columns.Add("EntryName");
    table.Columns.Add("ResultStartNode", typeof(Node));
    table.Columns.Add("ResultRuns", typeof(List<Run>));
    table.Columns.Add("EntryStyle", typeof(Style));
    table.Columns.Add("PageRef");
    table.Columns.Add("Page");

    // Get the FieldStart of the specified TOC.
    Node currentNode = (Node)FindTocStartFromIndex(doc);

    // Skip forward to the first field separator (after the TOC field code).
    while (currentNode.NodeType != NodeType.FieldSeparator)
        currentNode = currentNode.NextPreOrder(doc);


    // First node of the paragraph
    currentNode = currentNode.NextPreOrder(doc);

    bool isCollecting = true;
    int countOfFieldItems = 0;
    bool isAfterFirstTocEntry = false;
    bool isHyperlinked = currentNode.NodeType == NodeType.FieldStart;

    while (isCollecting)
    {

        StringBuilder entryRefCode = new StringBuilder();
        StringBuilder entryText = new StringBuilder();
        StringBuilder pageRefCode = new StringBuilder();
        StringBuilder pageText = new StringBuilder();
        // Ensures that first entry is gotten from TOC
        if (!isAfterFirstTocEntry)
        {
            // Skip nodes until encounters a run
            while (currentNode.NodeType != NodeType.Run)
            {
                currentNode = currentNode.NextPreOrder(doc);
            }

            isAfterFirstTocEntry = true;
        }


        if (isHyperlinked)
        {

            // Collect all runs in the field code until we encounter the field separator
            while (currentNode.NodeType != NodeType.FieldSeparator)
            {
                entryRefCode.Append(currentNode.Range.Text.Trim());
                currentNode = currentNode.NextPreOrder(doc);
            }

            // Skip past field separator
            currentNode = currentNode.NextPreOrder(doc);
        }


        // Break if no data products in IDMP
        if (currentNode.Range.Text.Contains("No table of contents entries found."))
        {
            table.Columns.Clear();
            return table;
        }

        Node entryPositionNode = null;
        List<Run> fieldResultRuns = new List<Run>();
        Style entryStyle = null;

        while (currentNode.NodeType != NodeType.FieldStart)
        {

            countOfFieldItems++;
            if (currentNode.NodeType == NodeType.Run)
            {
                if (entryPositionNode == null)
                    entryPositionNode = currentNode.PreviousPreOrder(doc);

                fieldResultRuns.Add((Run)currentNode.Clone(false));
                entryStyle = ((Run)currentNode).ParentParagraph.ParagraphFormat.Style;
            }

            entryText.Append(currentNode.Range.Text.Trim());
            currentNode = currentNode.NextPreOrder(doc);
        }

        countOfFieldItems = 0;
        // Skip nodes until FieldStart (of PAGEREF)

        while (currentNode.NodeType != NodeType.FieldStart)
        {
            currentNode = currentNode.NextPreOrder(doc);
        }

        currentNode = currentNode.NextPreOrder(doc);
        pageRefCode.Append(currentNode.Range.Text);

        // Skip nodes until FieldSeparator (of PAGEREF)
        while (currentNode.NodeType != NodeType.FieldSeparator)
        {
            currentNode = currentNode.NextPreOrder(doc);
        }

        // Add the runs from the field which should be the page number

        currentNode = currentNode.NextPreOrder(doc);
        pageText.Append(currentNode.Range.Text);

        // Add to datatable

        table.Rows.Add(new object[] { entryRefCode.ToString(), entryText.ToString(), entryPositionNode, fieldResultRuns, entryStyle, pageRefCode.ToString(), pageText.ToString() });
        currentNode = currentNode.NextPreOrder(doc);

        // Skip to the first run of the the next paragraph (should be next entry). Check if a TOC field end is found at the same time
        bool isNextPara = false;
        bool isChecking = true;
        while (isChecking)
        {
            currentNode = currentNode.NextPreOrder(doc);
            // No node found, break.
            if (currentNode == null)
            {
                isCollecting = false;
                break;
            }

            // Passed a new paragraph
            if (currentNode.NodeType == NodeType.Paragraph)
                isNextPara = true;

            // Found first run of a new paragraph
            if (isNextPara && currentNode.NodeType == NodeType.Run)
                isChecking = false;

            // Once we encounter a FieldEnd node of type FieldTOC then we know we are at the end
            // of the current TOC and we can stop here.
            if (currentNode.NodeType == NodeType.FieldEnd)
            {

                Aspose.Words.Fields.FieldEnd fieldEnd = (Aspose.Words.Fields.FieldEnd)currentNode;
                if (fieldEnd.FieldType == Aspose.Words.Fields.FieldType.FieldTOC)
                {
                    isCollecting = false;
                    break;
                }
            }
        }
    }

    return table;
}

public static FieldStart FindTocStartFromIndex(Document doc)
{
    if (doc.Range.Fields.Cast<Field>().Where(f => f.Type == FieldType.FieldTOC).ToList().Count > 0)
    {
        Field field = doc.Range.Fields.Cast<Field>().Where(f => f.Type == FieldType.FieldTOC).ToList()[0];
        return field.Start;
    }

    else
        return null;
}

Rajeshraj1983 · January 17, 2018, 2:17pm

Thank you so much for the support…