We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract text by regex to Array

I am having word files with some text like #tag1#, #abc#, #xyz#, etc.

I need to find all text starting and ending with # sign to array of String like ["#tag1", “#abc#”, “#xyz#”].

How can I achieve it?

@kunal.gupta.1983

In your case, we suggest you following solution.

  1. Please implement IReplacingCallback interface.
  2. Find the text using Range.Replace method.
  3. In IReplacingCallback.Replacing, move the cursor to the matched node and insert the bookmark at the position of # tag.
  4. Once bookmark are inserted into document, extract the content between bookmarks.

We suggest you please read the following articles.
Find and Replace
Moving the Cursor
Inserting a Bookmark
Extract Selected Content Between Nodes in a Document

Hi Tahir,

I am able to find text and highlight within document using following code:–

But I am not getting how to apply bookmark to text and extract it.

Please Help.

public class App
{
public static void main( String[] args )
{
Document doc;
try {
doc = new Document(“d:\agree.doc”);
FindReplaceOptions options = new FindReplaceOptions();
options.setReplacingCallback(new ReplaceEvaluatorFindAndHighlight());

  Pattern regex = Pattern.compile("(\\#\\S.*?\\#)", Pattern.CASE_INSENSITIVE);

  	doc.getRange().replace(regex, "", options);
  
  	doc.save("d:\\TestFile_out.doc");
  } catch (Exception e) {
  	// TODO Auto-generated catch block
  	e.printStackTrace();
  }
}

}

class ReplaceEvaluatorFindAndHighlight implements IReplacingCallback {
public int replacing(ReplacingArgs e) throws Exception {
Node currentNode = e.getMatchNode();
if (e.getMatchOffset() > 0)
currentNode = splitRun((Run) currentNode, e.getMatchOffset());
ArrayList runs = new ArrayList();
int remainingLength = e.getMatch().group().length();
while ((remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) {
runs.add(currentNode);
remainingLength = remainingLength - currentNode.getText().length();

  		do {
  			currentNode = currentNode.getNextSibling();
  		} while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
  	}

  	if ((currentNode != null) && (remainingLength > 0)) {
  		splitRun((Run) currentNode, remainingLength);
  		runs.add(currentNode);
  		currentNode.getText();
  	}

  	for (Run run : (Iterable<Run>) runs) {
  		run.getFont().setHighlightColor(Color.RED);
  	return ReplaceAction.SKIP;
  }

  private static Run splitRun(Run run, int position) throws Exception {
  	
  	Run afterRun = (Run) run.deepClone(true);
  	afterRun.setText(run.getText().substring(position));
  	run.setText(run.getText().substring((0), (0) + (position)));
  	run.getParentNode().insertAfter(afterRun, run);
  	return afterRun;
  }

}

@kunal.gupta.1983

Could you please share your input and expected output documents here for our reference? We will then provide you code example according to your requirement.