How to get node text only

In Aspose.Words for Java, is there a method to get the text of a node without any of the control characters? I just want the actual text that the reader of the document sees. Node.getText() seems to return all of the control characters around the actual text as well.

Please check the following article in our wiki on the topic:

https://docs.aspose.com/words/net/work-with-text-document/

It lacks Java code example however. We will compose and provide the sample code for Java shortly.

Best regards,

Hi, dngan,

SaveFormat.Text is not public in java at the moment, so you will be better to implement DocumentVisitor. Here is a code snippet how to do this:

public class TestTxtWriter
{
    @Test
    public void TxtWriterTest() throws Exception
    {
        TxtWriter txtWriter = new TxtWriter();
        Document doc = new Document("X:\\Aspose\\forum\\WordCount_Testing.doc");

        //save to txt file
        OutputStream stream1 = new FileOutputStream("X:\\Aspose\\forum\\out1.txt");
        txtWriter.save(doc, stream1);

         //save to string
        String text = txtWriter.getPlainText(doc);

        //save the string to a file
        OutputStream stream2 = new FileOutputStream("X:\\Aspose\\forum\\out2.txt");
        stream2.write(text.getBytes());

        //close streams
        stream1.close();
        stream2.close();
    }
}

      /**
        \* Responsible for saving document in text format.
        */
    class TxtWriter extends DocumentVisitor

    {
    TxtWriter()
    {
    }

    /**
        \* Saves the document in plain text format.
    */
    public void save(Document document, OutputStream stream) throws Exception
    {
        String text = getPlainText(document);
        stream.write(text.getBytes());

        //Not closing stream here as it is the client's responsibility.
        stream.flush();
    }

    /**
        \* Gets a plain text from the node.
    */
    public String getPlainText(Node node) throws Exception
    {
        mIsSkipText = false;
        mBuilder = new StringBuilder();

        //Extract text from the node.
        node.accept(this);

        //Remove remaining control characters
        String text = mBuilder.toString();
        text = text.replace(ControlChar.LINE_BREAK, ControlChar.CR_LF);
        text = text.replace(ControlChar.ANNOTATION_REF, "");
        text = text.replace(ControlChar.FOOTNOTE_REF, "");
        text = text.replace(ControlChar.DRAWN_OBJECT, "");

        return text;
    }

    public int visitRun(Run run)
    {
        appendText(run.getText());
        return VisitorAction.CONTINUE;
    }

    public int visitFieldStart(FieldStart fieldStart)
    {
        mIsSkipText = true;
        return VisitorAction.CONTINUE;
    }

    public int visitFieldSeparator(FieldSeparator fieldSeparator)
    {
        mIsSkipText = false;
        return VisitorAction.CONTINUE;
    }

    public int visitFieldEnd(FieldEnd fieldEnd)
    {
        mIsSkipText = false;
        return VisitorAction.CONTINUE;
    }

    public int visitParagraphEnd(Paragraph paragraph)
    {
        appendText(ControlChar.CR_LF);
        return VisitorAction.CONTINUE;
    }

    private void appendText(String text)
    {
        if (!mIsSkipText)
            mBuilder.append(text);
    }

    private StringBuilder mBuilder;
    private boolean mIsSkipText;
}

Best regards,