How to perform Regexp replacement?

Hi, Support:

What is the syntax for Regex Replacement?
For example:

Dim MyStr as string ="aaabbbcc@@@@dddeee"
Doc.Range.Text=MyStr
Doc.Range.Replace(New Regexp("([a-zA-Z]{1,3})[\@]+","$1",FindOptions)
Dim MyNewStr as string=Doc.Range.Text

The valid value of MyNewStr is expected to be “aaabbbccdddeee”, whereas the true output value is “aaabb$1dddeee”.
Would you help me to get the right result value as “aaabbbccdddeee”?

Thanks!

@ducaisoft

In your case, you want to replace ‘@@@@’ with ‘$1’. You can simply use [Range.Replace (String, String) method]https://reference.aspose.com/words/net/aspose.words/range/replace/.

If you still face problem, please share your input and expected output Word documents here for our reference. We will then provide you code example according to your requirement.

Thanks for your response.
Maybe you misunderstand my issue. for the regex.pattern="([0-9]+)abc" , regex.replacestr="$1", if the testStr="123456abc", if do the step: Result=Regex.replace(testStr,New Regex("([0-9]+)abc","$1"),
the valid value of Result is “123456”, here ,the “$1” is equal to “123456” or ([0-9]+).

@ducaisoft

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach the expected output Word file that shows the desired behavior.
  • Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

My questions are what is the syntax of the Doc.Range.Replace( RegExpPattern, ReplacementPattern) in aspose.words.dll?

for example:
for the popular syntax for VB6.0 or VB.net or javascript or php, the \r represents hard return which is equal to chr(13) or vbCrLf, whereas in aspose.words.dll, the &p represents hard return, that is to say:
in VB.net, Result=Regex.Replace( SourceStr, new Regex("\r"),“hard return”,ReplaceOptions) , must be run as Result=Doc.Range.Replace( new Regex("&p"),“hard return”,ReplaceOptions) , they output the same result.

Therefore, In VB.net, the code like Result=Regex.Replace( SourceStr, new Regex("(abc)[0-9]+\r"),"$1" & vbCrLf,ReplaceOptions), if SourceStr=“abc20201108” & vbCrLf, the value of Result is “abc” & vbCrLf,
My question is that how to get the same result in aspose.words.dll? you only need to tell me what is the syntax for ReplacementStrPattern in Aspose.Words.dll. For the Syntax of Regex in Finding field pattern as “(.+)”, if “(.x)” is being to be replaced as “(.+)”, the right syntax must be like this “$1”, my question is that what is the equivalent for the syntax “$1” in aspse.words.dll.

please refer to this demo http://www.ycbat.com.cn/downs/demo.rar

@ducaisoft

Thanks for sharing the detail. We suggest you please read overloads of Range.Replace method.

You are using Range.Replace(Regex, String, FindReplaceOptions) method

The first parameter is regular expression pattern of type System.Text.RegularExpressions.Regex used to find matches.

The second parameter is a string of type System.String to replace all occurrences of pattern.

Please note that syntax of Range.Replace method is different from Regex.Replace.

Please note that you are using $1 as second parameter in Range.Replace method and it is replacement string. So, the content are replaced with $1.

Regarding your query about following meta-characters.

  • &p - paragraph break
  • &b - section break
  • &m - page break
  • &l - manual line break
  • && - & character

Please use Regex (first parameter) according to your requirement and use these meta characters in second parameter as string replacement.

If you want to use these meta characters in pattern string as first parameter, you need to use overload of [Range.Replace(String, String, FindReplaceOptions) method]https://reference.aspose.com/words/net/aspose.words/range/replace/. This method does not use Regex pattern.

Moreover, please read following article for more detail:
Find and Replace

Thanks for your reply, but my issue is still pending.
Could you tell me how to get my desired output? Or make sure whether the aspose.words.dll can implement this function?
by using System.Text.RegularExpressions
if SrcTestStr=“abcdefd12345???”, Result=Regex.replace(SrcTestStr,"(.+?)[?]+","$1"), then the value of Result is “abcdefd12345” not “$1”,
my question is that: if Doc.Range.text=“abcdefd12345???”,Doc.Range.replace(new regex("(.+?)[?]+"),"$1",ReplaeOptions), why the Doc.Range.text is “$1” not “abcdefd12345”? you could tell me how to make Doc.Range.text as “abcdefd12345” after replacement operation.

I read the instruction about [Range.Replace method], and could not find the solution on my issue.

@ducaisoft

As shared in my previous post, Range.Repalce and Regex.Repalce are different methods and work differently. Please check the parameters of both methods.

Range.Replace method (Regex, String, FindReplaceOptions) replaces all occurrences of a character pattern specified by a regular expression with another string that is $1.

Aspose.Words’ API finds the text from the document using regex '(.+?)[?]+' and replace it with replacement text that is $1.

The replacement string ‘$1’ works differently in Regex.Replace method. However, you need to use first parameter (string or regex) in Range.Replace method according to your requirement. Please read the Range.Replace methods as suggested in my previous post.

Please use following line of code to get the desired output.

Doc.Range.Replace(New Regex(@"(\?)"), "", ReplaceOption)

Thank for your patient help!
But your suggestion like this “Doc.Range.Replace(New Regex(@”(?)"), “”, ReplaceOption)" to get the desired output, which actually can not get my desired output and can not meet my requirement.

so my issue is still pending, therefore, I strongly recommend to include this feature in the future version.

Your reply pay uselessness for my requirement, and I have to get the desired out by using :

For Each Parap As Global.Aspose.Words.Paragraph In doc.GetChildNodes(NodeType.Paragraph, True)
    Index = Index + 1
    Do
        Ln = Parap.Range.Text.Length
        Text = Parap.Range.Text
        If (Text.Length = 1 Or Text.Trim.Length = 1 Or Text.Trim = Chr(13)) And Text <> "?" And Text <> "?" Then GoTo nextp
        Runs = Parap.GetChildNodes(NodeType.Run, True)
        Pos = Runs.Count
        For i As Integer = 0 To Pos - 1
            Run = Runs(i)
            Text = Run.Text
            If i = Pos - 1 Then Text = Text & Chr(13)
            Pattern = "([\?\?\?\?\?])[\?\?\?\?\?]+"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1")
            Pattern = "[\?\?\?\?\?]+"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then
                Dim Matches = Regex.Matches(Text, Pattern)
                Pos = Matches.Count
                If Pos > 1 Then Text = Regex.Replace(Text, Pattern, "")
            End If
            Pattern = "^[\?\?\?\?\?]+"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "")
            Pattern = "(\p{P})[\?\?\?\?\?]+\r"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            Pattern = "([ ,。?:;‘’!""—……、]+|(-{2,})|([()]+)|([【】]+)|([\{\}]+)|([《》]+))[\?\?\?\?\?]+\r"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            Pattern = "(\p{P})[\?\?\?\?\?]+"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            Pattern = "[\?\?\?\?\?]+(\p{P})"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            Pattern = "[\?\?\?\?\?]+([ ,。?:;‘’!""—……、]+|(-{2,})|([()]+)|([【】]+)|([\{\}]+)|([《》]+))"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            Pattern = "([ ,。?:;‘’!""—……、]+|(-{2,})|([()]+)|([【】]+)|([\{\}]+)|([《》]+))[\?\?\?\?\?]+"
            IsMatch = Regex.IsMatch(Text, Pattern)
            If IsMatch Then Text = Regex.Replace(Text, Pattern, "$1" & Chr(13))
            If i = Pos - 1 Then Text = Replace(Text, Chr(13), "")
            On Error Resume Next
            Run.Text = Text
            If Err.Number Then
                Run.Text = ""
            End If
        Next
        My.Application.DoEvents()
    Loop While Ln <> Parap.Range.Text.Length
    My.Application.DoEvents()

nextp:
Next

@ducaisoft

We have logged this feature request as WORDSNET-21381 in our issue tracking system. You will be notified via this forum thread once it is available. We apologize for your inconvenience.

Thanks for your attention to my requirement, and I hope the regex-finding-replacing operation syntax should be according with what the regularExpression.

@ducaisoft You can easily achieve what you need by using IReplacingCallback in your code. Here is simple code to demonstrate the technique.

Document doc = new Document(@"C:\Temp\in.docx");
Regex regex = new Regex("(.+?)[?]+");
string replacement = "$1";
FindReplaceOptions opt = new FindReplaceOptions();
opt.ReplacingCallback = new MyReplacingCallback(regex, replacement);
doc.Range.Replace(regex, replacement, opt);
doc.Save(@"C:\Temp\out.docx");
private class MyReplacingCallback : IReplacingCallback
{
    public MyReplacingCallback(Regex regex, string replacement)
    {
        mRegex = regex;
        mReplacement = replacement;
    }

    public ReplaceAction Replacing(ReplacingArgs args)
    {
        args.Replacement = mRegex.Replace(args.Match.Value, mReplacement);
        return ReplaceAction.Replace;
    }

    private Regex mRegex;
    private string mReplacement;
}

Hope this helps.

@ducaisoft

You can use FindReplaceOptions.UseSubstitutions property as shown below to get the desired output. You can get or set a boolean value indicating whether to recognize and use substitutions within replacement patterns.

// Create document with desired text.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("aaabbbcc@@@@dddeee");

// Allow to recognize substitutions within replacement string.
FindReplaceOptions options = new FindReplaceOptions();
options.UseSubstitutions = true;

// It is important to use Regex pattern instead of string pattern to perform substitutions.
// Otherwise, meta-characters, such as [, { will be escaped and there will not be found any matchings at all.
doc.Range.Replace(new Regex("([a-zA-Z]{1,3})[@]+"), "$1", options);

// The output is: aaabbbccdddeee\u000c
Console.Write(doc.Range.Text);

Thanks for your demo.

And which version supports this function?
If StrText="123qqqq456ppppp”, and options.UseSubstitutions = true,

Document doc = new Document()
DocumentBuilder builder = new DocumentBuilder(doc)
builder.Write(StrText)
doc.Range.Replace(new Regex("(\d+)[a-zA-Z]@(\d+)"), "$1 MyDemo $2", options)

My question is what is the output?
my desired output should be “123 MyDemo 456”.

@ducaisoft

The regex for the specified example is incorrect. Please use the following code example to achieve your requirement.

String StrText = "123qqqq456ppppp";
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("123qqqq456ppppp");

FindReplaceOptions options = new FindReplaceOptions();
options.UseSubstitutions = true;

doc.Range.Replace(new Regex(@"(\d+)[a-zA-Z]*(\d+).*"), "$1 MyDemo $2", options);
Console.WriteLine(doc.Range.Text);

Thank you very much! This is what I expect, it work perfectly.
And another issue:

could you help me to get the original pic size?and how to convert pixel to centimeter?

for example:
a pic with width=3000pixel,height=2000pixel,dpi=300,w=68cm,h=45cm,was inserted into a Word document,by using MS Word,when right-click the pic and show the size-property dialog,it can view the original
size of the pic, where it display original size w=68cm,h=45cm,but its size in document may be w=18,h=14;
therefore,how to get the original size value of the pic by aspose.words.dll?
I try the dll, it only can get the size by pixel or
Piont, as well as can get its dpi,for example,w may be 1296px,h may be 868px,here,could you tell me how to convert the 1296px to 68cm, or tell me how to get the 68cm value by the aspose.words.dll?

@ducaisoft

The picture is imported as Shape node into Aspose.Words’ DOM. You can check either shape is image or not using Shape.HasImage property. The Shape.ImageData property provides access to the image of the shape. You can get the information about image size and resolution using ImageData.ImageSize property.

Following code example shows how to get the size of image and convert size to centimeters. Hope this helps you.

Document document = new Document(MyDir + "document.docx");
foreach (Shape shape in document.GetChildNodes(NodeType.Shape, true))
{
    if (shape.HasImage)
    {
        Console.WriteLine(shape.ImageData.ImageSize.WidthPoints);
        Console.WriteLine(shape.ImageData.ImageSize.HeightPoints);
        //Convert point to cm
        Console.WriteLine(ConvertUtil.PointToInch(shape.ImageData.ImageSize.WidthPoints) * 2.54);
        Console.WriteLine(ConvertUtil.PointToInch(shape.ImageData.ImageSize.HeightPoints) * 2.54);
    }
}