Free Support Forum - aspose.com

Extracting TOC content From Word

I want to convert word document into the html format. But i want to convert seperate html files for each of toc content … So if TOC index contains 50 headers , then i need to convert 50 html files based on their content.

@ashishsinghvi,

You will be able to parse a Table of Contents (TOC) field in Word DOC or DOCX document by using the following code. This C# code will also write and save the TOC text to individual HTML files:

C# Code:

Document doc = new Document("E:\\temp\\sample-input\\sample-input.docx");
int i = 1;
foreach (FieldStart field in doc.GetChildNodes(NodeType.FieldStart, true))
{
    if (field.FieldType.Equals(FieldType.FieldHyperlink))
    {
        FieldHyperlink hyperlink = (FieldHyperlink)field.GetField();
        if (hyperlink.SubAddress != null && hyperlink.SubAddress.StartsWith("_Toc"))
        {
            Paragraph tocItem = (Paragraph)field.GetAncestor(NodeType.Paragraph);
            if (tocItem != null)
            {
                DocumentBuilder builder = new DocumentBuilder();
                builder.Write(tocItem.ToString(SaveFormat.Text).Trim());
                builder.Document.Save("E:\\temp\\sample-input\\out_" + i++ + ".html");

                // To get text representation of a TOC Entry
                Console.WriteLine(tocItem.ToString(SaveFormat.Text).Trim());

                //// To get page numbers only
                //foreach (Field nestedField in tocItem.Range.Fields)
                //{
                //    if (nestedField.Type.Equals(FieldType.FieldPageRef))
                //    {
                //        //nestedField.Unlink();
                //        Console.WriteLine(nestedField.DisplayResult);
                //    }
                //}
            }
        }
    }
}

In case the above code does not produce the desired output for you, then please ZIP and attach the following resources here for testing:

  • Your simplified input Word document containing the TOC data
  • Your expected HTML files showing the desired output. You can manually create these files by using MS Word or OpenOffice Writer desktop applications or any other editors.

As soon as you get these pieces of information ready, we will start further investigation into your scenario and provide you code to achieve the same by using Aspose.Words.

1 Like