We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Retreiving Table Of Contents Data

I have a Word TOC where style has been applied.
Important Information
It is accessed as a ‘FieldHyperlink’ structure but I need to access the embedded runs that change style.

Can you direct the best way to do this?

Thanks
Jan

@JanUrbanski

Could you please ZIP and attach your input document and expected output here for our reference? We will then provide you more information about your query along with code example.

HSBC GIF - Working Copy.zip (6.8 MB)
On page 3 at the table of contents…
The first entry is: -
Important Information 2
I need to collect the content but with embedded style data

{Run[normal]}Important {Run[Italic]}Information

Hopefully this is clear

Thanks

Jan

@JanUrbanski

We are writing code example for your case and will share it with you as soon as possible.

Appreciate that - Thanks

@JanUrbanski

Please use the following code example to read the TOC items. Hope this helps you.

Document doc = new Document(MyDir + "HSBC GIF - Working Copy.docx");
doc.UpdateFields();

foreach (Field field in doc.Range.Fields)
{
    if (field.GetFieldCode().Contains("_TOC_"))
    {
        Paragraph paragraph = field.Start.ParentParagraph;
        Console.WriteLine("Paragraph style name : " + paragraph.ParagraphFormat.StyleName);
        Node node = field.Separator;
        while (node.NodeType != NodeType.FieldEnd)
        {
            if (node.NodeType == NodeType.Run)
            {
                Console.Write("Run font name : " + ((Run)node).Font.Name);
                Console.WriteLine("Run text : "+((Run)node).Text);
            }
            node = node.NextSibling;   
        }

        Console.WriteLine("----------------------------------------");
    }
}

Thanks so much. This works a treat

@JanUrbanski

It is nice to hear from you that the code example helps you. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi Tahir,
I hope its ok to continue this thread as its to do with the file I sent you.
The word xml structure is : -
<w:r>
<w:t>Important</w:t>
</w:r>
<w:r>
<w:rPr>
<w:spacing w:val="-2"/>
</w:rPr>
<w:t xml:space=“preserve”> </w:t>
</w:r>
<w:r w:rsidRPr=“00F9242E”>
<w:rPr>
<w:i/>
<w:iCs/>
</w:rPr>
<w:t>Information</w:t>
</w:r>
When Aspose gives the 2 runs there is no space so I get “InportantInformation”. Normally the space is included so I would see it as part of the text.
Is there a special way to recover the <w:t xml:space=“preserve”> </w:t> as it does not get returned?

Cheers

Jan

@JanUrbanski

Perhaps, you are using Aspose.Words in evaluation mode. The latest version of Aspose.Words for .NET 19.12 returns 3 Run nodes. The space between two words is in separate Run node. Please get the 30 days temporary license and apply it before importing the document.

Moreover, we suggest you please check the code example of document explorer from Aspose.Words for .NET examples repository at GitHub.

Hi Tahir,

We are using the Java version and due to my pom I was actually using the 19.6 version :frowning:
I rectified this but still no joy.
I was capturing the parent para and the decoding that later where I needed it rather than gets the runs as your sample does.
I can conform that this now collects the correct Runs. Not sure why it did not work out the other way

I found that I processed the following Nodes as child’s of the Paragraph: -

Node Type : GROUP_SHAPE
Node Type : SHAPE
Node Type : SHAPE
Node Type : SHAPE
Node Type : FIELD_START
Node Type : RUN
Node Type : FIELD_SEPARATOR
Node Type : RUN
Node Type : RUN
Node Type : FIELD_END

Clearly not the same.
Anyway thanks for your responses - it made me dig deeper
Cheers
Jan

@JanUrbanski

Have your problem solved? If you still face problem, please share the paragraph’s detail that you want to process. We will then provide you more information on it.

Hi Tahir,
I am good with the extraction of the runs although I am struggling to find a safe way to identify the areas that have been generated by Word.
Important Information 2
Section 1. General Information 11

Is there any way to identify where the page number is so I can safely supress it?
Conversely the 1. is generated.

Without doing some hairy detection code - Is there a way that Aspose can reveal the parts which are generated ?

Cheers
Jan

@JanUrbanski

Your document contains hyperlinks instead of TOC fields. Please open the document in MS Word and press Alt + F9 to check the hyperlink field codes.

The previous sibling node of FieldEnd node is Run node that contains the page number. You can get it using Node.PreviousSibling property. Hope this helps you.

Moreover, we suggest you please insert the table of content field in your document instead of hyperlinks. If you modify the text and page numbers of hyperlink, MS Word does not generate it again by updating fields.

Hi Tahir,
I understand your reply but unfortunately I am tied to this method as it seems that the Table of content entries need bespoke styles. Use of Header 1 - 3 will not work for the author ;-(

I am getting there with the current structure although I have got back to the case where iterating over the children of a STRUCTURED_DOCUMENT_TAG only reveals 1 of 3 RUNs.

I attach a cut down file to make it easier.
In row 2 , column 2 there is structure Tag with title “tocString”. It conatin 3 Runs but using the below only reveals 1 Run.

Can you tell me what I am doing wrong as its driving me nuts
t2CutDown.zip (1.1 MB)

Not I tried using the item(I) instead of iterator and also next sibling …

My Java function below:

private void createXMLfromTocString(Row row) {
	NodeCollection cellList = row.getChildNodes(NodeType.CELL, true);
	
	if ( cellList.getCount() > 1 ) {
		Cell cell1 = (Cell) cellList.get(1);
		Iterator stIterator = cell1.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true).iterator();
		while ( stIterator.hasNext()) {
			StructuredDocumentTag sdItem = (StructuredDocumentTag)stIterator.next();
			System.out.println(sdItem.getTitle());
			if ( sdItem.getTitle().equals("tocString")) {
				// Need to form a proper Para that we can convert to xml
				Paragraph p = new Paragraph(loadedWordDocument);
				Iterator childs = sdItem.getChildNodes().iterator();
				while ( childs.hasNext() ) {
					Node aChild = (Node) childs.next();
					p.appendChild(aChild);
				}
				System.out.println(p.getText());
			}
		}
	}
}

Thanks

Jan

@JanUrbanski

Please clone the child node that you want to append to paragraph as shown below to get the desired output.

p.appendChild(aChild.deepClone(true));

Thanks Tahir - All sorted

Cheers
Jan