Extracting all Headings

Hello,
Am evaluating Aspose, and one of our Use Case is to be able to extract all headings from document. How can I do that?

A.	Introduction 

Some intro from SK
• After the intro
 After the intro introd

  1. Headline Act 2
    Normal 2
    • Bullet 2
     Sq Bullet 2
    (a) Headline Act 3
    Normal 3 headline
    • Bullet 3
     Sq Bullet 3
    (i) Headline 4
    Normal 4

So from the text above I want to extract
Heading 1: Introduction
Heading 2: Headline Act 2
Heading 3: Headline Act 3

Code I was playign with which gives me all document not just the headings

try
{
    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setLoadFormat(LoadFormat.DOCX);
    loadOptions.setPassword(password);
    Document doc = new Document(srcFilePath, loadOptions);
    doc.removeMacros();
    doc.unprotect();
    doc.save(dstFilePath, SaveFormat.DOCX);
}
catch (Exception e)
{
    // System.out.println(e);
    return -1;
}

@aspose1212,

You can build on the following code to achieve what you are looking for:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}

2 posts were merged into an existing topic: List in Aspose

Thanks it works in most of the cases, but failed when I had track changes/comments on in the document.
Say if I have heading called “Introduction” and have added comment “nice job” then I using the code above I see this as “Introductionnice job”. How can I extract just the heading text without pulling in the track change/comments note?

I realize attaching a sample document will help, but it is a bit hard (not allowed) to attach documents on external sites from within our firm. Hope you will be able to reproduce this with my description above?

@aspose1212,

You can simply remove comments from Heading Paragraphs before getting their text:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        para.getChildNodes(NodeType.COMMENT, true).clear();
        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}
1 Like

Thanks that worked!

Hello, i have a question.

I am using this code for the same porpose. When i use it, the for loop does not go all the way through the file. I get a list with only half of the headings. How can this be?

@Rob_Hesseling The problem might occur because you are using Aspose.Words in evaluation mode. In this case as one of evaluation version limitation, Aspose.Words truncates the document to several hundreds of paragraphs.
You can request free 30-days license to test Aspose.Words without evaluation version limitations.

Ah oke thank you for the reply!

1 Like