How to remove text from the end of Paragraph using Java

Hi,

I have to find a replace a string in a document (for example “Paragraph”).
My Word document contains the following text:

Paragraph 1
This is my text
Paragraph 2
This is my second text

I have also to get the end of the line to put it in a list (for example). Because I don’t know how many ‘Paragraph’ there is in the document.
So I use the ‘MyReplaceEvalutor’ provided in example in your docs.

Here my code:

private List listOfParaph;

public void replaceString {
try {
listOfParaph = new ArrayList();
// Insert First doc
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
Document wordDoc = new Document(new File(“D:\Text.docx”).getAbsolutePath());
removeString(wordDoc);
wordDoc.save(“D\output.docx”);

} catch (Exception e) {
// TODO: handle exception
}
}

public static void removeString(Document doc) throws Exception {
doc.getRange().replace(Pattern.compile(“Paragraph.*”),
new MyReplaceEvaluator(), true);

for (String paragraphNumber : listOfParaph) {
System.out.println(paragraphNumber);
}
}

private static class MyReplaceEvaluator implements IReplacingCallback
{
/**
*
* This is called during a replace operation each time a match is found.
*
* This method appends a number to the match string and returns it as a
* replacement string.
/

public int replacing(ReplacingArgs e) throws Exception
{

// This is a Run node that contains either the beginning or the
// complete match.

Node currentNode = e.getMatchNode();

String replaceString = currentNode.getText();
if (replaceString.startsWith(“Paragraph”)) {
String replace2 = replaceString.replaceAll(“Paragraph”, “”);
e.setReplacement("");
listOfParaph.add(replace2);
}
return ReplaceAction.REPLACE;
}
}
}

All text 'Paragraph ’ are removed but I still get blank line.

Is it possible to remove it ?

I try to change the pattern "Paragraph.
" by "Paragraph.
"+ControlChar.LINE_BREAK but the string is not found.

In attachment there is the ‘Text.docx’ and the ‘output.docx’ (the result)

Thanks in advance

Hi Roseline,

Thanks for your query. Please use the following code snippet after calling method removeString(). Please let us know if you have any more queries.

Node[] nodes = wordDoc.getChildNodes(NodeType.PARAGRAPH, true).toArray();

for (int i = 0; i < nodes.length; i++)

{

Paragraph para = (Paragraph)nodes[i];

if (para.toTxt().trim().equals(""))

para.remove();

}

Hi Tahir,

Thanks for your answer.

But if in the doc, we have :

Paragraph 1
My text
Paragraph 2
My text before a blank line

My text after a blank line

The “blank line” between ‘My text before…’ and ‘My text after …’ will be removed too.
It is possible to have another way to remove only the ‘replaced’ paragraph ?

I modify the “replace” method to get the parent and try your solution but the string is not replaced so the paragraph is not ‘empty’ :

public int replacing(ReplacingArgs e) throws Exception
{
// This is a Run node that contains either the beginning or the
// complete match.
Node currentNode = e.getMatchNode();

String replaceString = currentNode.getText();
if (replaceString.startsWith(“Paragraph”)) {
String replace2 = replaceString.replaceAll(“Paragraph”, “”);
e.setReplacement("");
listOfParaph.add(replace2);
}
if(currentNode instanceof Run) {
Paragraph para = ((Run) currentNode).getParentParagraph();
if (para.toTxt().trim().equals("")) // Never happens because the //replacement is not done yet
para.remove();
}


Thanks in advance.

Hi Tahir,

I finally found!
I add a list in the class ‘MyReplaceEvaluator’ and add each parent Paragraph of the match node in the list.

Then, I get the list back and add the test you gave me (para.toTxt().trim().equals("")).

Thanks a lot.

Hi Roseline,

It is nice to hear from you that your problem has been solved. Please let us know if you have any more queries.