Free Support Forum - aspose.com

How to get TOC?

Hello, I want to get a directory has been written Word, how to get and output, can provide the relevant code. Thank you!

Hi

Thanks for your inquiry. TOC is actually the field. This field contains paragraphs with special style name like “TOC1”, “TOC2” etc. You can loop trough all paragraphs and get all paragraphs with style name contains “TOC”.

Here is code example:

Document doc = new Document("in.doc");

// Get Paragraph Collection

NodeCollection paragraphColl = doc.GetChildNodes(NodeType.Paragraph, true);

// Loop though all Paragraphs

foreach (Paragraph par in paragraphColl)

{

if (par.ParagraphFormat.Style.Name.Contains("TOC"))

Console.WriteLine(par.ToTxt());

}

To handle level indentation for TOC you should determine what Heading# style the corresponding paragraphs belong to.

Please use the following code to extract all paragraphs of HeadingX style (in this example X is 1-3) from the document:

// Get all paragraphs from the document

NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

foreach (Paragraph paragraph in paragraphs)

{

switch (paragraph.ParagraphFormat.StyleIdentifier)

{

case StyleIdentifier.Heading1:

case StyleIdentifier.Heading2:

case StyleIdentifier.Heading3:

// This para style is HeadingX

break;

}

}

Hope this helps.

Best regards,

Thank you very much for answer, but there is a problem, how to get the page number of each paragraph and chapter?

Hello

Thanks for your inquiry. In this case please try using the following code:

Document doc = new Document("C:\\Temp\\in.doc");

Node currentNode = null;

//Get collection of FieldStart nodes

Node[] fieldStarts = doc.GetChildNodes(NodeType.FieldStart, true).ToArray();

//Loop through all FieldStart nodes

foreach (FieldStart start in fieldStarts)

{

if (start.FieldType == FieldType.FieldTOC)

currentNode = (Node)start;

}

// Skip forward to the first field separator (after the TOC field code).

while (currentNode.NodeType != NodeType.FieldSeparator)

currentNode = currentNode.NextPreOrder(doc);

// First node of the paragraph

currentNode = currentNode.NextPreOrder(doc);

bool isCollecting = true;

int countOfFieldItems = 0;

while (isCollecting)

{

StringBuilder entryText = new StringBuilder();

StringBuilder pageText = new StringBuilder();

while (currentNode.NodeType != NodeType.FieldStart)

{

countOfFieldItems++;

entryText.Append(currentNode.GetText().Trim());

currentNode = currentNode.NextPreOrder(doc);

}

countOfFieldItems = 0;

currentNode = currentNode.NextPreOrder(doc);

// Skip nodes until FieldSeparator (of PAGEREF)

while (currentNode.NodeType != NodeType.FieldSeparator)

{

currentNode = currentNode.NextPreOrder(doc);

}

// Add the runs from the field which should be the page number

currentNode = currentNode.NextPreOrder(doc);

pageText.Append(currentNode.GetText());

// Show

Console.Write(entryText + "---" + pageText + "\n");

currentNode = currentNode.NextPreOrder(doc);

// Skip to the first run of the the next paragraph (should be next entry). Check if a TOC field end is found at the same time

bool isNextPara = false;

bool isChecking = true;

while (isChecking)

{

currentNode = currentNode.NextPreOrder(doc);

// No node found, break.

if (currentNode == null)

{

isCollecting = false;

break;

}

// Passed a new paragraph

if (currentNode.NodeType == NodeType.Paragraph)

isNextPara = true;

// Found first run of a new paragraph

if (isNextPara && currentNode.NodeType == NodeType.Run)

isChecking = false;

// Once we encounter a FieldEnd node of type FieldTOC then we know we are at the end

// of the current TOC and we can stop here.

if (currentNode.NodeType == NodeType.FieldEnd)

{

FieldEnd fieldEnd = (FieldEnd)currentNode;

if (fieldEnd.FieldType == FieldType.FieldTOC)

{

isCollecting = false;

break;

}

}

}

}

Best regards,

Thank you for your reply, a bit difficult to understand this, the page number you can see, but the chapter is like this HYPERLINK \ l “_Toc273402009”,Can be deleted?And the first can not be displayed. I want a result like this, for example: every paragraph has an AutoNumber ID,like 1、2、3、4,and each paragraph can get the parent ID and current page number, like this:

NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

foreach (Paragraph paragraph in paragraphs)
{

switch (paragraph.ParagraphFormat.StyleIdentifier)
{
case StyleIdentifier.Heading1:
Insert(id, parentid, page, chapter);
break;
case StyleIdentifier.Heading2:
Insert(id, parentid, page, chapter);
break;
case StyleIdentifier.Heading3:
Insert(id, parentid, page, chapter);
break;
}
}

Hi there,

Thanks for this additonal information, however could you please attach your template document here as well?

Thanks,

Of course, I have uploaded.

Hi there,

Thanks for attaching your document here for testing.

I think you can use the code below to achieve what you want. This will find and parse all of the paragraphs in the first TOC and print out the information of each entry.

DataTable tocTable = TableOfContentsToDataTable(doc, 0);<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

foreach (DataRow row in tocTable.Rows)

{

Console.WriteLine(string.Format("Entry name: {0}, Heading Level: {1}, Page number: {2}", row["EntryName"], ((Style)row["EntryStyle"]).StyleIdentifier, row["Page"]));

}

public static DataTable TableOfContentsToDataTable(Document doc, int tocIndex)

{

DataTable table = new DataTable();

table.TableName = "Toc " + tocIndex;

//******* Needed for Aspose's code

table.Columns.Add("EntryRef");

//****** end

table.Columns.Add("EntryName");

table.Columns.Add("ResultStartNode", typeof(Node));

table.Columns.Add("ResultRuns", typeof(List<Run>));

table.Columns.Add("EntryStyle", typeof(Style));

table.Columns.Add("PageRef");

table.Columns.Add("Page");

// Get the FieldStart of the specified TOC.

Node currentNode = (Node)FindTocStartFromIndex(doc, tocIndex);

// Skip forward to the first field separator (after the TOC field code).

while (currentNode.NodeType != NodeType.FieldSeparator)

currentNode = currentNode.NextPreOrder(doc);

// First node of the paragraph

currentNode = currentNode.NextPreOrder(doc);

bool isCollecting = true;

int countOfFieldItems = 0;

bool isAfterFirstTocEntry = false;

bool isHyperlinked = currentNode.NodeType == NodeType.FieldStart;

while (isCollecting)

{

StringBuilder entryRefCode = new StringBuilder();

StringBuilder entryText = new StringBuilder();

StringBuilder pageRefCode = new StringBuilder();

StringBuilder pageText = new StringBuilder();

// Ensures that first entry is gotten from TOC

if (!isAfterFirstTocEntry)

{

// Skip nodes until encounters a run

while (currentNode.NodeType != NodeType.Run)

{

currentNode = currentNode.NextPreOrder(doc);

}

isAfterFirstTocEntry = true;

}

if (isHyperlinked)

{

// Collect all runs in the field code until we encounter the field separator

while (currentNode.NodeType != NodeType.FieldSeparator)

{

entryRefCode.Append(currentNode.Range.Text.Trim());

currentNode = currentNode.NextPreOrder(doc);

}

// Skip past field separator

currentNode = currentNode.NextPreOrder(doc);

}

// Break if no data products in IDMP

if (currentNode.Range.Text.Contains("No table of contents entries found."))

{

table.Columns.Clear();

return table;

}

Node entryPositionNode = null;

List<Run> fieldResultRuns = new List<Run>();

Style entryStyle = null;

while (currentNode.NodeType != NodeType.FieldStart)

{

countOfFieldItems++;

if (currentNode.NodeType == NodeType.Run)

{

if (entryPositionNode == null)

entryPositionNode = currentNode.PreviousPreOrder(doc);

fieldResultRuns.Add((Run)currentNode.Clone(false));

entryStyle = ((Run)currentNode).ParentParagraph.ParagraphFormat.Style;

}

entryText.Append(currentNode.Range.Text.Trim());

currentNode = currentNode.NextPreOrder(doc);

}

countOfFieldItems = 0;

// Skip nodes until FieldStart (of PAGEREF)

while (currentNode.NodeType != NodeType.FieldStart)

{

currentNode = currentNode.NextPreOrder(doc);

}

currentNode = currentNode.NextPreOrder(doc);

pageRefCode.Append(currentNode.Range.Text);

// Skip nodes until FieldSeparator (of PAGEREF)

while (currentNode.NodeType != NodeType.FieldSeparator)

{

currentNode = currentNode.NextPreOrder(doc);

}

// Add the runs from the field which should be the page number

currentNode = currentNode.NextPreOrder(doc);

pageText.Append(currentNode.Range.Text);

// Add to datatable

table.Rows.Add(new object[] { entryRefCode.ToString(), entryText.ToString(), entryPositionNode, fieldResultRuns, entryStyle, pageRefCode.ToString(), pageText.ToString() });

currentNode = currentNode.NextPreOrder(doc);

// Skip to the first run of the the next paragraph (should be next entry). Check if a TOC field end is found at the same time

bool isNextPara = false;

bool isChecking = true;

while (isChecking)

{

currentNode = currentNode.NextPreOrder(doc);

// No node found, break.

if (currentNode == null)

{

isCollecting = false;

break;

}

// Passed a new paragraph

if (currentNode.NodeType == NodeType.Paragraph)

isNextPara = true;

// Found first run of a new paragraph

if (isNextPara && currentNode.NodeType == NodeType.Run)

isChecking = false;

// Once we encounter a FieldEnd node of type FieldTOC then we know we are at the end

// of the current TOC and we can stop here.

if (currentNode.NodeType == NodeType.FieldEnd)

{

Aspose.Words.Fields.FieldEnd fieldEnd = (Aspose.Words.Fields.FieldEnd)currentNode;

if (fieldEnd.FieldType == Aspose.Words.Fields.FieldType.FieldTOC)

{

isCollecting = false;

break;

}

}

}

}

return table;

}

If you have any further queries, please feel free to ask.

Thanks,

Hello, I can not find this method, what is the need to add anything?

FindTocStartFromIndex(doc, tocIndex) cant't find.

Hi there,

Sorry about that, please find the implementation of the missing method below.

public static FieldStart FindTocStartFromIndex(Document doc, int tocIndex)<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

{

// Store the FieldStart nodes of TOC fields in the document for quick access.

ArrayList fieldStarts = new ArrayList();

// This is a list to store the nodes found inside the specified TOC. They will be removed

// at thee end of this method.

ArrayList nodeList = new ArrayList();

foreach (FieldStart start in doc.GetChildNodes(NodeType.FieldStart, true))

{

if (start.FieldType == FieldType.FieldTOC)

{

// Add all FieldStarts which are of type FieldTOC.

fieldStarts.Add(start);

}

}

// Ensure the TOC specified by the passed index exists.

if (tocIndex > fieldStarts.Count - 1)

throw new ArgumentOutOfRangeException("TOC index is out of range");

return (FieldStart)fieldStarts[tocIndex];

}

Thanks,

Looks good, but still a little disappointed, I expected something like this:

Entry name: 1XXX, ID: 1, PrantID:0, Page number: 1
Entry name: 2XXX, ID: 2, PrantID:0, Page number: 1
Entry name: 2.1XXX, ID: 3, PrantID:2, Page number: 1
Entry name: 2.2XXX, ID: 4, PrantID:2, Page number: 2
Entry name: 2.3XXX, ID: 5, PrantID:2, Page number: 2
Entry name: 2.4XXX, ID: 6, PrantID:2, Page number: 4
Entry name: 2.4.1XXX, ID: 7, PrantID:6, Page number: 4
Entry name: 2.4.2XXX, ID: 8, PrantID:6, Page number: 4
Entry name: 2.4.3XXX, ID: 9, PrantID:6, Page number: 5

Hi

Thanks for your request. There are no IDs for TOC items in MS Word documents. However, I think, you can easily calculate them in your code. You can use HeadingLevel to move to next level.

Best regards,