We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extracting all Headings

Hello,
Am evaluating Aspose, and one of our Use Case is to be able to extract all headings from document. How can I do that?

A.	Introduction 

Some intro from SK
• After the intro
 After the intro introd

  1. Headline Act 2
    Normal 2
    • Bullet 2
     Sq Bullet 2
    (a) Headline Act 3
    Normal 3 headline
    • Bullet 3
     Sq Bullet 3
    (i) Headline 4
    Normal 4

So from the text above I want to extract
Heading 1: Introduction
Heading 2: Headline Act 2
Heading 3: Headline Act 3

Code I was playign with which gives me all document not just the headings

try
{
    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setLoadFormat(LoadFormat.DOCX);
    loadOptions.setPassword(password);
    Document doc = new Document(srcFilePath, loadOptions);
    doc.removeMacros();
    doc.unprotect();
    doc.save(dstFilePath, SaveFormat.DOCX);
}
catch (Exception e)
{
    // System.out.println(e);
    return -1;
}

@aspose1212,

You can build on the following code to achieve what you are looking for:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}

2 posts were merged into an existing topic: List in Aspose

Thanks it works in most of the cases, but failed when I had track changes/comments on in the document.
Say if I have heading called “Introduction” and have added comment “nice job” then I using the code above I see this as “Introductionnice job”. How can I extract just the heading text without pulling in the track change/comments note?

I realize attaching a sample document will help, but it is a bit hard (not allowed) to attach documents on external sites from within our firm. Hope you will be able to reproduce this with my description above?

@aspose1212,

You can simply remove comments from Heading Paragraphs before getting their text:

for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_1 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2 ||
            para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3 /* and so on*/) {

        para.getChildNodes(NodeType.COMMENT, true).clear();
        System.out.println(para.toString(SaveFormat.TEXT) + " <-- this is a heading para");
    }
}

Thanks that worked!