Replace tags in a string with a checkbox

Warrick · March 12, 2024, 5:48am

Hi, I’m evaluating Aspose.words and have hit a stumbling block.
We replace strings (%[*] ) in a Word Doc, with data that may itself contain tags that subsequently need to be replaced. For example, we might replace "%[CBS]" with “A line of checkboxes eg, !CB! and another !CB! and one more for good measure !CB!”
So we first replace %[*] with the string above and then search again for !CB!

I have the code working and the docx displays the checkboxes but the issue is all 3 checkboxes appear at the start of the text.
Eg:
"XXX A line of checkboxes eg, and another and one more for good measure. " where X is a checkbox object.
I assume this is occurring because the Checkbox object has to exist outside the run so is there any easy way to break the run up and ensure they get inserted in the correct position within the string?

Code I’m using is below.

Many thanks!

DocumentBuilder builder = new DocumentBuilder((Document)args.MatchNode.Document);
builder.MoveTo(args.MatchNode);
builder.InsertCheckBox("", "true", 0);

alexey.noskov · March 12, 2024, 5:53am

@Warrick Could you please attach your input and expected output documents here for our reference? We will check your documents and provide you more information.

Warrick · March 19, 2024, 7:07am

Hi Alexey,
I’ve figured it out now and the doc is being created identically to our old Word versions now.
I split the runs and ensured all variables for replacement were in their own run.
Thanks,
Warrick

alexey.noskov · March 19, 2024, 7:30am

@Warrick It is perfect that you managed to resolve the problem. Please feel free to ask in case of any issues, we will be glad to help you.

Warrick · March 21, 2024, 4:13am

Hi Alexy,

I have run into a different issue.
I have a document with 2 sets of different content between "{{" and "}}" strings. Refer attached:
duplicate_range.docx (12.8 KB)

We need to be able to duplicate the content between each {{ }} range multiple times but for now I’m struggling just to duplicate once.

When I run the attached code, it errors out with:
System.ArgumentException: ‘Cannot insert a node of this type at this location.’

Can you point me in the right direction?? I’m a newcomer to .NET and C# so I trust I have the syntax correct.

Thanks, Warrick

using System;
using Aspose.Words;
using System.Text.RegularExpressions;
using Aspose.Words.Replacing;
using System.Collections;
using DocsExamples.Programming_with_Documents.Contents_Management;

namespace ProcessWordDocs
{
    class Program
    {
        static ArrayList startNodes = new ArrayList();
        static ArrayList endNodes = new ArrayList();

        static void Main(string[] args)
        {

            Document doc = new Document("duplicate_range.docx");
            FindReplaceOptions opt = new FindReplaceOptions();

            opt.ReplacingCallback = new findStartDoubleBrace();

            // Store the nodes found for each occurrence of {{
            doc.Range.Replace(new Regex("{{"), "", opt);

            // Store the nodes found for each occurrence of }}
            opt.ReplacingCallback = new findEndDoubleBrace();
            doc.Range.Replace(new Regex("}}"), "", opt);

            // Work through each start node found
            int count = 0;
            foreach (Node startNode in startNodes)

            {
                // Get the end node matching the start node
                Node endNode = (Node)endNodes[count];

                // extract all nodes between start and end nodes

                List<Node> extractedNodes = ExtractContentHelper.ExtractContent(startNode, endNode, true);

                // Reverse the nodes so we can insert after starting with the last node in the extracted list


                extractedNodes.Reverse();

                // Duplicate the extracted nodes

                
                foreach (Node extractedNode in extractedNodes)
                {
                    /// This code fails
                    /// 
                    endNode.ParentNode.InsertAfter(extractedNode, endNode);
                }
                count++;

            }
            doc.Save("duplicate_range_done.docx");


        }

        public class findStartDoubleBrace : IReplacingCallback
        {
            ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
            {
                startNodes.Add(args.MatchNode);
                args.Replacement = "XX";
                return ReplaceAction.Skip;
            }
        }
        public class findEndDoubleBrace : IReplacingCallback
        {
            ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
            {
                endNodes.Add(args.MatchNode);
                args.Replacement = "YY";
                return ReplaceAction.Skip;
            }
        }
    }
}

alexey.noskov · March 21, 2024, 6:51am

@Warrick You should create a document from the extracted nodes and using DocumentBuilder.InsertDocument method. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

// Replace start and end tags with themsefs to make them to be represented as a separate Run nodes.
FindReplaceOptions opt = new FindReplaceOptions();
opt.UseSubstitutions = true;
doc.Range.Replace(new Regex(@"(\{\{)|(\}\})"), "$0", opt);

// Now select start and end tags.
List<Run> runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();
List<Run> startTags = runs.Where(r => r.Text == "{{").ToList();
List<Run> endTags = runs.Where(r => r.Text == "}}").ToList();

for (int i = 0; i < startTags.Count; i++)
{
    Run start = startTags[i];
    Run end = endTags[i];

    // Exrtact Content between start and end tags
    List<Node> contentNodes = ExtractContentHelper.ExtractContent(start, end, true);
    // Create document from extracted nodes.
    Document extractedContent = ExtractContentHelper.GenerateDocument(doc, contentNodes);

    // Insert extracted content after the end tag several times.
    builder.MoveTo(end.NextSibling != null ? end.NextSibling : end.ParentNode);
    for (int j = 0; j < 5; j++)
        builder.InsertDocument(extractedContent, ImportFormatMode.UseDestinationStyles);
}

doc.Save(@"C:\Temp\out.docx");

Warrick · March 21, 2024, 10:51am

Thanks so much Alexey!
You make it look easy and have obviously done this many times
It works like a charm.
One other question; what if I want to delete all nodes between start and end for one of the {{…}} groups?
I tried the following but it only deleted the first paragraph and left the table.

Node currentNode = start;
while ( currentNode!=null)
{
     Node nextNode = currentNode.NextPreOrder(doc);
     if(currentNode == end) { currentNode.Remove(); break; }
     currentNode.Remove();
     currentNode = nextNode;                            
}

Thanks
Warrick

alexey.noskov · March 21, 2024, 11:03am

@Warrick Please try using the following code:

Node currentNode = start;
while (currentNode != null && currentNode!= end)
{
    Node nextNode = currentNode.NextPreOrder(doc);
    currentNode.Remove();
    currentNode = nextNode;
}
end.Remove();

Warrick · March 21, 2024, 11:34am

Hi Alexy,
I really appreciate the super fast response. 10:30pm here so time to stop for the night.

Thhe code is better, with the end }} being removed but try with the attached docx. There are tab characters in the line " Subtest Name Score Types and Scores" that I suspect are causing the issue. Is currentNode being set to null when it reads these perhaps??

duplicate_range.docx (13.0 KB)

Thanks,
Warrick

alexey.noskov · March 21, 2024, 1:29pm

@Warrick There is an easier way to remove the range. The following code works fine with the attached document:

// Remove the range from the source document.
// Wrap the ramge into a bookmark and then remove bookmakr content by setting it's text.
string tempBookmarkName = $"tmp_bookmark_{i}";
start.ParentNode.InsertBefore(new BookmarkStart(doc, tempBookmarkName), start);
end.ParentNode.InsertAfter(new BookmarkEnd(doc, tempBookmarkName), end);
doc.Range.Bookmarks[tempBookmarkName].Text = "";
doc.Range.Bookmarks[tempBookmarkName].Remove();

It wraps the range into a temporary bookmark, then content is removed by setting text of the bookmark to empty string and finally removes the temporary bookmark.

Warrick · March 21, 2024, 11:52pm

alexey.noskov:

string tempBookmarkName = $"tmp_bookmark_{i}";
start.ParentNode.InsertBefore(new BookmarkStart(doc, tempBookmarkName), start);
end.ParentNode.InsertAfter(new BookmarkEnd(doc, tempBookmarkName), end);
doc.Range.Bookmarks[tempBookmarkName].Text = "";
doc.Range.Bookmarks[tempBookmarkName].Remove();

Thanks Alexey. This has been a great learning curve.