Insert bookmarks in place of tags in a Work document

I have a Word document in which there are some tags inserted around the content e.g. @@tag_01, @@tag_02 and @@tag_03 etc.

These tags I wish to replace with bookmarks e.g. bookmark_01, bookmark_02 and bookmark_03.

I’m trying to do this from code, but it doesn’t do what I want it to.
Instead bookmarks are inserted enclosing the entire text in the line where a tag was located, prior to being removed by the code.

PS. the version of Aspose.Words .NET is 16.8.0.0.


string template = "path-to-template";
string final = "path-to-modified-template";

string[] tags = new string[] { "@@tag_01", "@@tag_02", "@@tag_03" };
string[] bookmarks = new string[] { "bookmark_01", "bookmark_02", "bookmark_03" };

License license = new License();
license.SetLicense("Aspose.Words.lic");

Document doc = new Document(template);

var i = 0;

foreach (Run run in doc.GetChildNodes(NodeType.Run, true))
{
    i = 0;

    if (run.Text.Contains(tags[i]))
    {
        BookmarkStart bookmarkStart = new BookmarkStart(doc, bookmarks[i]);
        run.ParentNode.InsertBefore(bookmarkStart, run);

        run.Text = run.Text.Replace(tags[i], "");

        BookmarkEnd bookmarkEnd = new BookmarkEnd(doc, bookmarks[i]);
        run.ParentNode.InsertAfter(bookmarkEnd, run);
    }

    i++;

    if (run.Text.Contains(tags[i]))
    {
        BookmarkStart bookmarkStart = new BookmarkStart(doc, bookmarks[i]);
        run.ParentNode.InsertBefore(bookmarkStart, run);

        run.Text = run.Text.Replace(tags[i], "");

        BookmarkEnd bookmarkEnd = new BookmarkEnd(doc, bookmarks[i]);
        run.ParentNode.InsertAfter(bookmarkEnd, run);
    }

    i++;

    if (run.Text.Contains(tags[i]))
    {
        BookmarkStart bookmarkStart = new BookmarkStart(doc, bookmarks[i]);
        run.ParentNode.InsertBefore(bookmarkStart, run);

        run.Text = run.Text.Replace(tags[i], "");

        BookmarkEnd bookmarkEnd = new BookmarkEnd(doc, bookmarks[i]);
        run.ParentNode.InsertAfter(bookmarkEnd, run);
    }
}

doc.Save(final);

I can’t attach the template document and the final document or I would have.

@pl1 You can achieve this using Find and Replace functionality and IReplacingCallback. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");
FindReplaceOptions opt = new FindReplaceOptions(FindReplaceDirection.Backward);
opt.ReplacingCallback = new ReplaceEvaluatorWrapWithBookmark();
doc.Range.Replace("test", "bookmakr_name", opt);
doc.Save(@"C:\Temp\out.docx");
internal class ReplaceEvaluatorWrapWithBookmark : IReplacingCallback
{
    /// <summary>
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// </summary>
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        Document doc = (Document)e.MatchNode.Document;

        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further deleting.
        List<Run> runs = new List<Run>();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            remainingLength > 0 &&
            currentNode != null &&
            currentNode.GetText().Length <= remainingLength)
        {
            runs.Add((Run)currentNode);
            remainingLength -= currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            } while (currentNode != null && currentNode.NodeType != NodeType.Run);
        }

        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add((Run)currentNode);
        }

        // Generate an unique bookmakr name. Another approach can be used.
        string bookmarkName = e.Replacement;
        while (doc.Range.Bookmarks[bookmarkName] != null)
            bookmarkName += "_" + Guid.NewGuid().ToString();

        // Insert a bookmakr around the matched text
        BookmarkStart start = new BookmarkStart(doc, bookmarkName);
        BookmarkEnd end = new BookmarkEnd(doc, bookmarkName);

        runs[0].ParentNode.InsertBefore(start, runs[0]);
        runs[runs.Count-1].ParentNode.InsertAfter(end, runs[runs.Count-1]);

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        run.ParentNode.InsertAfter(afterRun, run);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring((0), (0) + (position));
        return afterRun;
    }
}

The code wraps each occurrence of "test" word into a bookmark.

Thank you Alexey for your swift response.

I can’t say that I understand all that is going on, due to my somewhat limited knowledge to Aspose.Words, plus the fact that it has been several years since I have used it.

If I understand it correctly in regard to the scenario I described, I should be able to achieve the goal this way?

Document doc = new Document(template);

FindReplaceOptions opt = new FindReplaceOptions(FindReplaceDirection.Backward);
opt.ReplacingCallback = new ReplaceEvaluatorWrapWithBookmark();

doc.Range.Replace("@@tag_01", "bookmakr_01", opt);
doc.Range.Replace("@@tag_02", "bookmakr_02", opt);
doc.Range.Replace("@@tag_03", "bookmakr_02", opt);

doc.Save(final);

If this is correct, would it also be possible to insert HTML content, instead of creating a bookmark, in place of the tags?

@pl1 Yes, your code is correct and the specified tags will be wrapped with the bookmarks. You can modify the code and use regular expression to wrap all tags with bookmarks in one Range.Replace operation.

Yes, you can use IReplacingCallback to insert HTML content. Please see the following implementation:

internal class ReplaceEvaluatorFindAndReplaceWithHtml : IReplacingCallback
{
    /// <summary>
    /// This method is called by the Aspose.Words find and replace engine for each match.
    /// </summary>
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        Document doc = (Document)e.MatchNode.Document;

        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        // This array is used to store all nodes of the match for further deleting.
        List<Run> runs = new List<Run>();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            remainingLength > 0 &&
            currentNode != null &&
            currentNode.GetText().Length <= remainingLength)
        {
            runs.Add((Run)currentNode);
            remainingLength -= currentNode.GetText().Length;

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            } while (currentNode != null && currentNode.NodeType != NodeType.Run);
        }

        // Split the last run that contains the match if there is any text left.
        if (currentNode != null && remainingLength > 0)
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add((Run)currentNode);
        }

        // Create DocumentBuilder to insert HTML.
        DocumentBuilder builder = new DocumentBuilder(doc);
        // Move builder to the first run.
        builder.MoveTo(runs[0]);
        // Insert HTML.
        builder.InsertHtml(e.Replacement, HtmlInsertOptions.UseBuilderFormatting);

        // Delete matched runs
        foreach (Run run in runs)
            run.Remove();

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        run.ParentNode.InsertAfter(afterRun, run);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring((0), (0) + (position));
        return afterRun;
    }
}

I believe that the reasoning for the need to insert bookmarks in the document, is to be able to replace these bookmarks with HTML content by building documents from the HTML and then inserting these documents into the bookmarks.
This is most likely not the optimal way of doing things, but it is what the current project code was doing to documents that already had bookmarks in them.

Now the documents no longer has bookmarks in them but only tags (e.g. @@tag_0X).

However, if it is possible to insert HTML directly into the document, I don’t think there is any need for creating bookmarks.

I tried your previous code to insert bookmarks and it works - however, the tags are still in the document e.g. @@tag_01 => [@@_tag_01].

The idea was to replace the tag with a named bookmark to be adressed later - the tag should no longer be in the document.
But again, if tags can be replaced with HTML content directly, the bookmarks might not be needed at all. The tag should still not be in the document once replaced.

@pl1 Yes, ReplaceEvaluatorWrapWithBookmark does not remove the matched text, it only wraps the matched text with bookmark. You can remove all nodes from the runs list after inserting bookmark, in this case the matched text will be removed:

// Delete matched runs
foreach (Run run in runs)
    run.Remove();

And yes, if your goal is to replace the tag with HTML, it is not necessary to insert a bookmark first, you can insert HTML at the matched text as shown in the last code example.

This code can’t build for me, probably because of the version of the library which is 16.8.0.0.

The compiler doesn’t know “HtmlInsertOptions” in this line.

builder.InsertHtml(e.Replacement, HtmlInsertOptions.UseBuilderFormatting);

i’m not sure which version I can upgrade to with the license I have - it was purchased January 14th 2017.

@pl1 Yes, you are right, there was no such overload in 16.8 version. You can use the following code insted:

builder.InsertHtml(e.Replacement);

Every Aspose license provides a 1-year subscription for free upgrades to any new Aspose.Words version that comes out.

You can check the license expiration date by opening the license file in Notepad (but take care not to modify and save the license file or it will no longer work) and checking the SubscriptionExpiry field.

<SubscriptionExpiry>20220218</SubscriptionExpiry>

It means that you can free upgrade to version of Aspose.Words published before 02/18/2022.

Thank you for that code, it seems to works like a cham, at least in the small test I just did.

Am I right to assume that I can upgrade to version 18.6.0, which was published January 6th 2018, according to NuGet?

@pl1 18.6 version has been released on the 1st of June 2018. The major number in Aspose.Words version is the year and the minor number is the month. 18.6 means June 2018 release. January 2018 version is 18.1.