Find Text

tmdolphin · June 11, 2014, 1:42am

Good Morning,
i have a question concerning Finding of Text within a Document for further Processing. I looked at the examples in the Docs. There the main purpose is, to replace text based on a Pattern.
What I need to do is, that i want to Search the Document based on a Pattern and at the specific point this pattern occurs, i need to do further processing, for example add a Picture at the place of a predefined Placeholder. My first implementation was, to simply loop the whole Document.

Is there a more comfortable Way to do this provided through the APIs?

best regards and Thanks
Rob

tmdolphin · June 11, 2014, 2:26am

I just noticed, that the Custom ReplacingCallback is ment to fit to the problem i described. Somehow i overread this possibility at a first glance…

awais.hafeez · June 11, 2014, 9:35pm

Hi Robert,

Thanks for your inquiry. Yes, please implement IReplacingCallback interface to achieve this. In case you have further inquires or need any help, please let us know.

Best regards,

tmdolphin · June 18, 2014, 5:19am

Thanks Awais for your reply,
i tried the IReplacingCallback and played a little bit around. I now have a Question on this…
If i use a regular expression which i search for:

Pattern.compile("<(NAME|AGE)>([.^]*.)");

I search for e.g. My Name. In a Document. I now sometimes get not the match, as it seems to be splitted in multiple runs in the Document. Is there a Chance to set as option to search over multiple Nodes, like i would use the Find/Replace Function in Word itself?

best regards and thanks
Rob

awais.hafeez · June 19, 2014, 4:19am

Hi Rob,

Thanks for your inquiry. Please try executing the following code:

static class ReplaceEvaluatorFindAndReplace implements IReplacingCallback {
    public int replacing(ReplacingArgs e) throws Exception {
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = SplitRun((Run) currentNode, e.getMatchOffset());

        // This array is used to store all nodes of the match for further removing.
        ArrayList runs = new ArrayList();
        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group(0).length();
        while (
                (remainingLength > 0) &&
                        (currentNode != null) &&
                        (currentNode.getText().length() <= remainingLength)) {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();
            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.getNextSibling();
            }
            while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0)) {
            SplitRun((Run) currentNode, remainingLength);
            runs.add(currentNode);
        }

        DocumentBuilder builder = new DocumentBuilder((Document) e.getMatchNode().getDocument());
        builder.moveTo((Run) runs.get(runs.size() - 1));
        builder.write("new value");

        //Now remove all runs in the sequence.
        for (Run run : (Iterable<Run>) runs)
            run.remove();

        return ReplaceAction.SKIP;
    }
}

private static Run SplitRun(Run run, int position) throws Exception {

    Run afterRun = (Run) run.deepClone(true);
    afterRun.setText(run.getText().substring(position));
    run.setText(run.getText().substring(0, position));
    run.getParentNode().insertAfter(afterRun, run);
    return afterRun;
}

I hope, this helps.

Best regards,