Read the text of document using Java

evaboy · February 13, 2020, 9:02pm

Please l want to read word document text but the sample code l saw only reads the first line of a page. It doesn’t read all the contents.

Document doc = new Document("C:\\Path\\to\\Documents\\NewDoc.docx");
for(int i=0; i<doc.getPageCount(); i++) {
        Paragraph paragraph = (Paragraph)doc.getChild(NodeType.PARAGRAPH, i, true);          
    NodeCollection<Node> children = paragraph.getChildNodes();
    for (Node child : (Iterable<Node>) children) {
        // Paragraph may contain children of various types such as runs, shapes and so on.
        if (child.getNodeType() == NodeType.RUN) {
            // Say we found the node that we want, do something useful.
            Run run = (Run)child;
                System.out.println( run.getText());
            
        }
    }
}

tahir.manzoor · February 14, 2020, 6:16am

@evaboy

Please use Node.ToString method as shown below to get the text of whole document. Hope this helps you.

Document doc = new Document(MyDir + "input.docx");
System.out.println(doc.toString(SaveFormat.TEXT));