Get document outline and the text of table of context

ahulihu · March 24, 2016, 1:45pm

I try to get document outline and the text of table of context of attached file.

try to get outline level by paragraph.getParagraphFormat().getOutlineLevel()
But the return value is always 9

try to get the text of table of content by paragraph.getText()
but get below value :
TOC \o "1-3" \h \z \u HYPERLINK \l "_Toc446459127" Title 1 PAGEREF _Toc446459127 \h 1

HYPERLINK \l "_Toc446459128" SubTitle 1 PAGEREF _Toc446459128 \h 1

tahir.manzoor · March 28, 2016, 7:03am

Hi Hugh,

Thanks for your inquiry. It seems that you want to get the list level number of list item. Please use ListFormat.ListLevelNumber property to get the list level number (0 to 8) for the paragraph as shown in following code example.

To get the text of Table of Contents, please use Node.ToString method as shown below. Hope this helps you.

Document doc = new Document(MyDir + "Test Outline
List Hyperlink.docx");
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable)paragraphs)
{
    // Get the list level number
    if (para.isListItem())
    {
        System.out.println(para.getListFormat().getListLevelNumber());
    }
    // Get the text of Table of contents
    if (para.getText().contains("_Toc"))
    {
        System.out.println(para.toString(SaveFormat.TEXT));
    }
}

ahulihu · March 28, 2016, 8:43am

Hi,

Thanks for your response. But I still have problem.
1)I want to get the outline value, but your code only can get the outline level
2) I want to get the text of the table of content which not include the page number. but you code get both the text and page number

I attached a image, and mark the text I want to get. Please to check it.

Hugh

tahir.manzoor · March 29, 2016, 3:21am

Hi Hugh,

Thanks for your inquiry.

ahulihu:
1)I want to get the outline value, but your code only can get the outline level

ListLabel class defines properties specific to a list label. Please use ListLabel.LabelString property to get a string representation of list label. We have modified the code example according to your requirements. Hope this helps you.

ahulihu:
2) I want to get the text of the table of content which not include the page number. but you code get both the text and page number

Please use the same Node.ToString(SaveFormat.TEXT) method to to export the content of the paragraph node into a string. Please use following code example to remove the last page number from string.

Moreover, please note that the TOC is a field in MS Word document. It will not actually build the table of contents. The table of contents is built by Microsoft Word when the field is updated.

In Microsoft Word, fields are not automatically updated when a document is opened, but you can update fields in a document at any time by pressing F9. Please call Document.UpdateFields method to update the fields in the document.

Document doc = new Document(MyDir + "Test Outline List Hyperlink.docx");
doc.updateListLabels();
doc.updateFields();
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable)paragraphs)
{
    // Get the list level number
    if (para.getListFormat().isListItem())
    {
        ListLabel label = para.getListLabel();
        System.out.println(label.getLabelString());
    }
    // Get the text of Table of contents
    if (para.getText().contains("_Toc"))
    {
        String text = para.toString(SaveFormat.TEXT).replace("\t", " ");
        text = text.substring(0, text.lastIndexOf(" "));
        System.out.println(text);
    }
}