We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

How to extract just text from the document but without headers?


I’m currently using the following code to get the text from the document. Unfortunately it returns also the headers. How can I get only the body text?

Document doc = new Document(fileName);
String text = doc.getText();

Thanks in advance for any help,

Found the answer:


Hi Mariusz,

Thanks for your inquiry. Please use the following code to achieve this:

String text = doc.getFirstSection().getBody().toString(SaveFormat.TEXT);

I hope, this helps.

Best regards,

Thank you. What’s the difference between getText() and toString(SaveFormat.TEXT)?

Hi Mariusz,

Thanks for your inquiry.

The CompositeNode.GetText method gets the text of this node and of all its children. The returned string includes all control and special characters as described in ControlChar class. The following code shows the difference between calling the GetText and ToString methods on a node.
Document doc = new Document();

// Enter a dummy field into the document.
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertField(“MERGEFIELD Field”);

// GetText will retrieve all field codes and special characters
System.out.println("GetText() Result: " + doc.getText());

// ToString will export the node to the specified format. When converted to text it will not retrieve fields code
// or special characters, but will still contain some natural formatting characters such as paragraph markers etc.
// This is the same as “viewing” the document as if it was opened in a text editor.
System.out.println("ToString() Result: " + doc.toString(SaveFormat.TEXT));

I hope, this helps.

Best regards,

Great, thank you for the clarification!