Find and replace text with image using regex

I am trying to use regex to find specific texts and replace them with an image but I am not getting the expected results. I have attached the test document (Test.docx) and the resultant document (TestResult.docx) after replacement. It looks like the regex is finding all the text between the first <#inf and the final #> as opposed to finding 5 separate instances of the text between <#inf and #> (highlighted lines in the test document) and replacing them with 5 images (have attached TestExpectation.docx, which is what I am expecting).

Here’s the c# code I am using:

Document doc = new Document(@"C:\Temp\Test.docx");
FindReplaceOptions options = new FindReplaceOptions();
options.MatchCase=false;
options.ReplacingCallback = new FindAndInsertImage(options);

var regex = new Regex(@"<#inf\s[a-zA-Z]+.*#>", RegexOptions.IgnoreCase);
doc.Range.Replace(regex, String.Empty, options);

doc.Save(@"C:\Temp\TestResult.docx");

private class FindAndInsertImage : IReplacingCallback
{
	internal FindAndInsertImage(FindReplaceOptions options)
	{
		mOptions = options;
	}

	//This simplistic method will only work well when the match starts at the beginning of a run. 
	ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
	{
		DocumentBuilder builder = new DocumentBuilder((Document)args.MatchNode.Document);
		builder.MoveTo(args.MatchNode);

		var shape = builder.InsertImage(File.ReadAllBytes(@"c:\temp\barcode1.png"));
		return ReplaceAction.Replace;
	}

	private readonly FindReplaceOptions mOptions;
}

Using the regex to test on https://regex101.com/ shows 5 separate matches as I expected.

I have also tried the following regex:

  • <#inf\sBarcode\s[a-zA-Z]+.*#>
  • <#inf .*#>

TestResult.docx (61.5 KB)
Test.docx (12.6 KB)
TestExpectation.docx (67.3 KB)

@imranmp,
You should use the ‘?’ character in your regular expression. It will make your regular expression ungreedy. In your case, you can use the following regular expression to get the expected result:

var regex = new Regex(@"<#inf\s[a-zA-Z]+.*?#>", RegexOptions.IgnoreCase);
1 Like