Hello Aspose team!
We are trying to use Aspose.Words to edit a content of word document, and as one of the tests we are using Document.Range.Replace function with callback to implement custom operation with found text.
public class ReplacingCallback : IReplacingCallback
{
private readonly DocumentBuilder _builder;
public ReplacingCallback(DocumentBuilder builder)
{
_builder = builder;
}
ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
{
//test logging
Console.WriteLine(args.MatchNode.Range.Text);
_builder.MoveTo(args.MatchNode);
//just an example, real logic will be more complex and will insert multi line markdown
_builder.Write("FirstLine1\r\nSecondLine2\r\nThirdLine3");
args.MatchNode.Remove();
return ReplaceAction.Skip;
}
}
var document = new Document("input.docx");
var builder = new DocumentBuilder(document);
var replacingCallback = new ReplacingCallback(builder);
var regex = new Regex("FirstLine[&p|&b|&l|\\s]*?SecondLine[&p|&b|&l|\\s]*?ThirdLine");
var replaceResult = document.Range.Replace(regex, "<empty, replacement logic is in callback>", new FindReplaceOptions
{
MatchCase = true,
ReplacingCallback = replacingCallback,
SmartParagraphBreakReplacement = true
});
document.Save("output.docx");
We expected to have args.MatchNode in replacing callback to be set to 3-line text, which match provided regular expression, but, unfortunately, we are getting only first line of the bullet list as an test console output and code replaces only 1st line of document:
You could find our test solution in attachment:
MultilineReplacementTest.zip (12.8 KB)
Could you please take a look on our code and help us to resolve the issue? We need to have an option to work with whole text, matched by Document.Range.Replace function, not only first line.
Thanks in advance!
@vladislavpedanepam
It seems that you are encountering an issue where the Document.Range.Replace
method is only processing the first line of a matched bullet list when using a custom replacing callback. This behavior can occur due to how the MatchNode
is structured in the document, especially when dealing with multi-line text.
To address this, you can modify your callback implementation to ensure that it processes all lines of the matched text. Here’s a revised approach:
-
Iterate through the runs: Instead of relying solely on args.MatchNode
, you can traverse through the runs that make up the matched text. This way, you can gather all the text that matches your regex pattern.
-
Construct the replacement text: After collecting all the relevant runs, you can construct the replacement text and insert it using the DocumentBuilder
.
Here’s an example of how you might implement this:
public class ReplacingCallback : IReplacingCallback
{
private readonly DocumentBuilder _builder;
public ReplacingCallback(DocumentBuilder builder)
{
_builder = builder;
}
ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
{
// Collect all runs that are part of the match
List<Run> runs = new List<Run>();
Node currentNode = args.MatchNode;
// Traverse through the runs to gather all text
while (currentNode != null)
{
if (currentNode.NodeType == NodeType.Run)
{
runs.Add((Run)currentNode);
}
currentNode = currentNode.NextSibling;
}
// Construct the replacement text
string replacementText = string.Join("\r\n", new[] { "FirstLine1", "SecondLine2", "ThirdLine3" });
// Move to the first run of the match and insert the new text
_builder.MoveTo(runs[0]);
_builder.Write(replacementText);
// Remove the original matched nodes
foreach (var run in runs)
{
run.Remove();
}
return ReplaceAction.Skip;
}
}
In this implementation, the callback collects all runs that are part of the matched text and constructs a multi-line replacement. This should help you replace the entire matched text instead of just the first line.
Make sure to test this approach with your document to ensure it meets your requirements. If you continue to experience issues, consider checking the structure of your document to ensure that the regex pattern is correctly matching the intended text.
If you have any further questions or need additional assistance, feel free to ask!
Hello @Professionalize.Discourse, thank for your answer. Unfortunately, provided code works the same as our code - it replaces only first line, and even for the first cycle currentNode.NextSibling returns null.
Could you please review and test it on your side with the provided document too?
Regards,
Vladislav
@vladislavpedanepam As I can see in your code you are modifying the document while replacing. This might affect the replacing process. So I would suggest to search in backward direction. Please try modifying your code like this:
var replaceResult = document.Range.Replace(regex, "<empty, replacement logic is in callback>", new FindReplaceOptions
{
MatchCase = true,
ReplacingCallback = replacingCallback,
SmartParagraphBreakReplacement = true,
Direction = FindReplaceDirection.Backward
});
Hi @alexey.noskov, I’ve tried both Directions, neither FindReplaceDirection.Backward
nor FindReplaceDirection.Forward
works in this case, the result is the same.
Even if I change code to avoid editing text, I’m getting only first line in the text in args.MatchNode, instead of getting whole matched text:
ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
{
var currentNode = args.MatchNode;
//Expected - 3 lines of text, Actual - only first line
Console.WriteLine(currentNode.Range.Text);
return ReplaceAction.Replace;
}
At the same time I am sure regexp match whole text, if I run code above - it replace whole 3 lines. Looks like args.MatchNode contains incorrect value in this case, or I just unable to correctly use it to retrieve whole list of nodes from it. Could you please help me with this part?
Regards, Vladislav
@vladislavpedanepam In your code you are removing only ReplacingArgs.MatchNode
which represent the first Run
of the matched content. but in your case the whole match is represented as 3 paragraphs. The easiest way to resolve this is to replace the whole match with temporary placeholder and then replace it with the actual content:
Regex regex = new Regex("FirstLine[&p|&b|&l|\\s]*?SecondLine[&p|&b|&l|\\s]*?ThirdLine");
string tmpPlaceholder = "[[REPLACE_ME]]";
Document doc = new Document(@"C:\Temp\in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
// Replace multi-paragraphs match with temporary placeholder.
doc.Range.Replace(regex, tmpPlaceholder);
// Replace temporary placeholder with actual content.
Run placeholderRun = doc.GetChildNodes(NodeType.Run, true).Cast<Run>()
.Where(r => r.Text == tmpPlaceholder).First();
builder.MoveTo(placeholderRun);
builder.Write("FirstLine1\r\nSecondLine2\r\nThirdLine3");
placeholderRun.Text = "";
doc.Save(@"C:\Temp\out.docx");
1 Like