Find the text in the header and body of document using C#

Hello,

I have been searching the forum for a good approach on how to replace text in a Word Document. Could you please direct me to a good resource on how to replace text based on finding “x” string in a run? I need to do this both in header and on the first page of the document. I am using Aspose.Words for .NET 10.6.0.

For example, I’ve attached the first page of a typical document I will have to be modifying. The “desired text to be changed” (A & B) will not have quotes around them. The header may have one, two, three, or more lines. The page content should always be second line. To ensure that this is correct, I will need to verify that the line before and the line after match their respective keywords (2 & 3).

Any help you can provide would be greatly appreciated.

Thank you.

Hi Rob,

Thanks for your inquiry.

You can refer to the following documentation to learn how to use find and replace in Aspose.Words: https://docs.aspose.com/words/net/find-and-replace/

If we can help with anything else, please feel free to ask.

Thanks,

Hi Adam,

Thank you for the info. How can I restrict the the find and replace tool to certain sections of the document? I only want to change text in the header and the first page of document. The third example, using a custom evaluator, looks like it will be my best choice. Any suggestions on how to ensure I get the whole line of text where my keyword is located in the header? How about the word that has to be replaced in the main text of the first page? The text that needs to be replaced there is the line between the line with keyword x, and keyword y. Can regex be used here, or not because of the lines are in different run nodes? The file I attached with the initial post gives the best examples.

Thanks.

Any suggestions, please?

Thank you.

Hi Rob,

Please accept my apology for late response. Please use the following code snippet for your requirement. Please read following documentation link for HeaderFooterTypes.

https://reference.aspose.com/words/net/aspose.words/headerfootertype

private class MyReplaceEvaluator : IReplacingCallback
{
    /// 
    /// This is called during a replace operation each time a match is found.
    /// This method appends a number to the match string and returns it as a replacement string.
    /// 
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        //e.Replacement = e.Match.ToString() + mMatchNumber.ToString();
        e.Replacement = "Replaced Text";
        mMatchNumber++;
        return ReplaceAction.Replace;
    }
    private int mMatchNumber;
}
Document doc = new Document(MyDir + "Example+Doc+for+Text+Replacement.doc");
Node[] headerfooters = doc.GetChildNodes(NodeType.HeaderFooter, true).ToArray();
foreach (HeaderFooter headerfooter in headerfooters)
{
    if (HeaderFooterType.HeaderPrimary == headerfooter.HeaderFooterType)
    {
        headerfooter.Range.Replace(new Regex("[A]"), new MyReplaceEvaluator(), true);
    }
}

You can also replace text in first section as mentioned in following line of code. Hope this helps you. Let me know, If you have any more queries.

doc.FirstSection.Range.Replace(...)

Thank you.

According to the documentation, replace will change all instances that match the regex. How can I limit it to only one replacement, and not all matches? In particular, my match will be the first match, if starting at the beginning of the document.

Can I cross over lines with my regex? If I read the documentation correctly, I can only match text on one line (Post 305597). However I need to mach text over several lines. If I can get a match over several lines, do I use any special regex? Could you give an example for the following scenario?

General text ending with Keyword1
Text to Match
Keyword 2 followed by general text.

Specifically it would look like this

Bah bah bah keyword1
Match text on this line.
Keyword2 bah bah

My task is to first find the desired text, extract it, and then display it to the end user. The end user then gets to decide if they want to keep this text, or replace it with something else. If they want to replace it I will go back into the document and change out the ‘matched’ text. So I guess, for the first part I don’t even have to replace text, just match it. However I figure I can use the same function to do both. On just matching the text, the replacement text would be the matching text itself.

Thank you.

Hi Rob,

Thanks for sharing more information. I am working on this issue and will update you soon.

Thank you.

Hi Rob,

Please accept my apology for late response. You can use the same code which I shared before. I have modified MyReplaceEvaluator
class, This code will not replace all instances that match the regex but only replace first match. Please let us know, If you have any more queries.

private class MyReplaceEvaluator : IReplacingCallback
{
    /// 
    /// This is called during a replace operation each time a match is found.
    /// This method appends a number to the match string and returns it as a replacement string.
    /// 
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        if (mMatchNumber == 0)
        {
            //First Match
            e.Replacement = "Replaced Text";
            mMatchNumber++;
            return ReplaceAction.Replace;
        }
        else
        {
            //Your code
            e.Replacement = e.Match.ToString();
            mMatchNumber++;
            return ReplaceAction.Replace;
        }
    }
    private int mMatchNumber;
}

Not a problem at all. Just to make sure, this will go cover multiple runs, and will allow for special characters such as paragraph breaks and line breaks? If so, paragraph breaks are represents in Aspose code as: /p, correct? So my above example would be converted to:

run.text="Bah bah bah keyword1/p"
run.text="Match text on this line./p"
run.text="Keyword2 bah bah/p"

If that is the case I can make my regex code look like this: (keyword1//p(.*?)//pkeyword2)

Thanks!

Hi Rob,

I have tried to understand your problem statement but unfortunately I have not completely understand your query. Please share some more detail about your question and sample word document.

Thanks.

Hi,

Here is a walk through of what I want to do. First, the header:

  1. Move cursor to header
  2. Go through header node by node
  3. When cursor gets to the run containing keyWordForHeader1 start down alternate flow
    1. Note: This run will end with a paragraph break
    2. Note: This run is the second paragraph / line of the header example
  4. Move cursor along to next the run containing the next paragraph
    1. Note: This is run will end with paragraph break
    2. Note: This is the third paragraph / line of the header example
  5. If current run contains keyWordForHeader2 and the previous paragraph contained keyWordForHeader1 then select text of the third paragraph.
    1. Select text is the desiredTextToChangeInHeader
  6. Replace selected text.
  7. If keywords are not found move onto below problem.

Second, search first section for desired text

  1. Move cursor to first section of document.
  2. Walk nodes until we find the first run with keyWordTitle1
    1. Note: This run will end with a paragraph break
    2. Note: This run is the first paragraph / line of the first section example
  3. If found keyWordTitle1 then move down alternate path
  4. Move cursor through nodes until we have moved two paragraphs.
    1. Note: This run will end with a paragraph break
    2. Note: This run is the third paragraph / line of the first section example
  5. If this third paragraph contains keyWordTitle1, move the cursor back to the node containing the second paragraph.
  6. Select all the text in this second paragraph and replace.
    1. Note: This run is the second / line of the first section example
  7. If keyWordTitle1 and keyWordTitle2 are not found move on.

I have attached two documents. They are before and after documents. I have highlighted the desired changes in the second document, for reference value only. I do not intent to highlight the results in my actual program.

Thank you.

Hi Rob,

Thanks for sharing the detail information. The document(Example+Doc+for+Text+Replacement.doc) has following text. For first scenario, Please explain the point # 4. It seems that you are replacing the text between two keywords.

Optional General Text

General Text

“Desired Text to Change A” with KEYWORD 1

  1. Move cursor to header
  2. Go through header node by node
  3. When cursor gets to the run containing keyWordForHeader1 start down alternate flow

There is no text “Keyword1” Please explain.

  1. Note: This run will end with a paragraph break
  2. Note: This run is the second paragraph / line of the header example
  3. Move cursor along to next the run containing the next paragraph
    1. Note: This is run will end with paragraph break
    2. Note: This is the third paragraph / line of the header example
  4. If current run contains keyWordForHeader2 and the previous paragraph contained keyWordForHeader1 then select text of the third paragraph.
    1. Select text is the desiredTextToChangeInHeader
  5. Replace selected text.
  6. If keywords are not found move onto below problem.

The wrong file got uploaded. The new file I have uploaded will match the correct terminology.

Yes, the text is being replaced between two keywords. I can not use the native Aspose replace functionality since it will not accept ‘end of paragraph’ (special character). Therefore the only way I know of getting around this restriction to go line by through the document. Once the condition in #3 is met then the program will look for the condition in #4. With both of these conditions met, the program will then replace the text of the run containing keyWordForHeader2.

Thank you.

Hi Rob,

Thanks for sharing the document. I have created a sample program which do the followings:

  1. Finds the keyword1 and keyword2
  2. Delete the whole text between these two key words
  3. insert the desired text between these two key words
  4. remove the keywords
Document doc = new Document(MyDir + "Example+Doc+for+Text+Replacement.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
string desiredtextchanged = "This is changed Text";
//Get run of keyword1
Run keyword1 = GetKeywordPosition(doc, "keyWordForHeader1");
//Get run of keyword2
Run keyword2 = GetKeywordPosition(doc, "keywordForHeader2");
Boolean blnDelete = false;
if (keyword1 != null && keyword2 != null)
{
    Node[] Runs = doc.GetChildNodes(NodeType.Run, true).ToArray();
    foreach (Run run in Runs)
    {
        if (run == keyword2)
        {
            blnDelete = false;
            run.Remove();
            continue;
        }
        else if (blnDelete == true)
        {
            run.Remove();
        }
        else if (run == keyword1)
        {
            builder.MoveTo(run);
            CompositeNode dstStory = run.ParentNode;
            dstStory.InsertAfter(new Run(doc, desiredtextchanged), run);
            blnDelete = true;
            run.Remove();
            continue;
        }
    }
}
doc.Save(MyDir + @"output.docx");
private Run GetKeywordPosition(Document doc, string keyword)
{
    Node[] Runs = doc.GetChildNodes(NodeType.Run, true).ToArray();
    foreach (Run run in Runs)
    {
        if (run.Text.Trim().IndexOf(keyword) > -1)
        {
            return run;
        }
    }
    return null;
}

Hope this helps you. Let me know, If you have any more queries.

Great. Thank you. To help increase the run time efficiency of the code, how can I limit the code to run on only certain sections of the document? I will run this code once on the headerfooter, and then on the first section of the document.

Thank you.

I incorporated the code into my project. It worked with some modification.

private void FindReplace(Document doc, string rootPath)
{
    DocumentBuilder builder = new DocumentBuilder(doc);
    string desiredtextchanged = "This is changed Text";

    //Get the cooresponding run for each keyword
    Run keywordHeader1 = GetKeywordPosition(doc, "keywordHeader1");
    Run keywordHeader2 = GetKeywordPosition(doc, "keywordHeader2");
    Run keywordTitle1 = GetKeywordPosition(doc, "keywordTitle1");
    Run keywordTitle2 = GetKeywordPosition(doc, "keywordTitle2");

    Boolean blnDelete = false;
    Node[] Runs = doc.GetChildNodes(NodeType.Run, true).ToArray();

    // Replace in Header
    if (keywordHeader1 != null && keywordHeader2 != null)
    {
        foreach (Run run in Runs)
        {
            if (run == keywordHeader2)
            {
                blnDelete = false;
                builder.MoveTo(run);
                run.Text = "REpalcement header Text";
                // run.Remove();
                continue;
            }
            else if (blnDelete == true)
            {
                // run.Remove();
                builder.MoveTo(run);
            }
            else if (run == keywordHeader1)
            {
                // builder.MoveTo(run);
                // CompositeNode dstStory = run.ParentNode;
                // dstStory.InsertAfter(new Run(doc, desiredtextchanged), run);
                blnDelete = true;
                // run.Remove();
                continue;
            }
        }
    }

    // Replace Title in first section
    if (keywordTitle1 != null && keywordTitle2 != null)
    {
        foreach (Run run in Runs)
        {
            if (run == keywordTitle2)
            {
                blnDelete = false;
                // run.Remove();
                continue;
            }
            else if (blnDelete == true)
            {
                // run.Remove();
                builder.MoveTo(run);
                run.Text = desiredtextchanged;
            }
            else if (run == keywordTitle1)
            {
                // builder.MoveTo(run);
                // CompositeNode dstStory = run.ParentNode;
                // dstStory.InsertAfter(new Run(doc, desiredtextchanged), run);
                blnDelete = true;
                // run.Remove();
                continue;
            }
        }
    }

    doc.Save(rootPath + @"output.docx");
}

Is there any way to get extract run nodes from a collection of headerfooter nodes? So basically call:

NodeCollection headerfooterNodes = doc.GetChildNodes(NodeType.HeaderFooter, true);

Is this basically the same idea as

HeaderFooter headerFooter = (HeaderFooter)doc.GetChild(NodeType.HeaderFooter, 0 , true);
NodeCollection headerFooterRuns = headerFooter.ChildNodes;

Then extract just the run nodes from headerfooterNodes? I am thinking this might be a start to getting the ‘speed up’ I want in running the code. (Basically the answer to my previous post.)

What would you suggest, please?

Thank you.

Hi Rob,

You can use following code snippet for your scenario. Hope this helps you.

HeaderFooter headerFooter = (HeaderFooter)doc.GetChild(NodeType.HeaderFooter, 0, true);
NodeCollection headerFooterRuns = headerFooter.ChildNodes;
foreach (Node node in headerFooterRuns)
{
    if (node.NodeType == NodeType.Run)
    {
        //Your Code
    }
}