Extracting all Headings

aspose1212 · March 29, 2019, 12:28pm

Hello,
Am evaluating Aspose, and one of our Use Case is to be able to extract all headings from document. How can I do that?

A.	Introduction

Some intro from SK
• After the intro
 After the intro introd

Headline Act 2
Normal 2
• Bullet 2
 Sq Bullet 2
(a) Headline Act 3
Normal 3 headline
• Bullet 3
 Sq Bullet 3
(i) Headline 4
Normal 4

So from the text above I want to extract
Heading 1: Introduction
Heading 2: Headline Act 2
Heading 3: Headline Act 3

Code I was playign with which gives me all document not just the headings

try
{
    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setLoadFormat(LoadFormat.DOCX);
    loadOptions.setPassword(password);
    Document doc = new Document(srcFilePath, loadOptions);
    doc.removeMacros();
    doc.unprotect();
    doc.save(dstFilePath, SaveFormat.DOCX);
}
catch (Exception e)
{
    // System.out.println(e);
    return -1;
}

awais.hafeez · March 30, 2019, 1:54am

@aspose1212,

You can build on the following code to achieve what you are looking for:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}

awais.hafeez · April 4, 2019, 5:10am

2 posts were merged into an existing topic: List in Aspose

aspose1212 · April 11, 2019, 5:23pm

Thanks it works in most of the cases, but failed when I had track changes/comments on in the document.
Say if I have heading called “Introduction” and have added comment “nice job” then I using the code above I see this as “Introductionnice job”. How can I extract just the heading text without pulling in the track change/comments note?

I realize attaching a sample document will help, but it is a bit hard (not allowed) to attach documents on external sites from within our firm. Hope you will be able to reproduce this with my description above?

awais.hafeez · April 12, 2019, 7:10am

@aspose1212,

You can simply remove comments from Heading Paragraphs before getting their text:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        para.getChildNodes(NodeType.COMMENT, true).clear();
        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}

aspose1212 · April 14, 2019, 6:47am

Thanks that worked!

Rob_Hesseling · September 22, 2022, 12:41pm

Hello, i have a question.

I am using this code for the same porpose. When i use it, the for loop does not go all the way through the file. I get a list with only half of the headings. How can this be?

alexey.noskov · September 22, 2022, 12:51pm

@Rob_Hesseling The problem might occur because you are using Aspose.Words in evaluation mode. In this case as one of evaluation version limitation, Aspose.Words truncates the document to several hundreds of paragraphs.
You can request free 30-days license to test Aspose.Words without evaluation version limitations.

Rob_Hesseling · September 22, 2022, 12:55pm

Ah oke thank you for the reply!