How to remove the content between two custom tags using .NET

sheyin · September 23, 2020, 2:23pm

Hello Team,

We have purchased license from aspose word product family and we got a new requirement for removing content (Multiple paragraphs with ) from word that enclosed between placeholders like <<FP_start>> <<FP_end>> . We got few sample from intent but it is in VB language and sample is not working for us. Looking for working code snippet from you. please not we need full function to read file from file path and removing the content between. The link which you provided in internet is not working for us.

sheyin · September 23, 2020, 3:18pm

Capture.png (38.1 KB)

Please refer the attachment

sheyin · September 24, 2020, 3:42am

Team Any update?

tahir.manzoor · September 24, 2020, 5:09am

@sheyin

In your case, we suggest you please find the start and end tag, bookmark them and remove them using Bookmark.Text property by setting its value to empty string.

Following code example shows how to find the tags and remove contents between them.

Document doc = new Document(MyDir + "in.docx");
FindAndInsertBookmark start = new FindAndInsertBookmark("bookmark", true, 1);
FindAndInsertBookmark end = new FindAndInsertBookmark("bookmark", false, 1);

FindReplaceOptions findReplaceOptions = new FindReplaceOptions();
findReplaceOptions.ReplacingCallback = start;
doc.Range.Replace("Your start tag...", "", findReplaceOptions);

findReplaceOptions.ReplacingCallback = end;
doc.Range.Replace("Your end tag...", "", findReplaceOptions);

doc.Range.Bookmarks["bookmark1"].Text = "";

doc.Save(MyDir + "out.docx");

public class FindAndInsertBookmark : IReplacingCallback
{
    string bmname;
    public int i = 1;
    Boolean isStart;
    DocumentBuilder builder;
    public FindAndInsertBookmark(string bmname, Boolean isStart, int i)
    {
        this.bmname = bmname;
        this.isStart = isStart;
        this.i = i;
    }
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        if (builder == null)
            builder = new DocumentBuilder((Document)currentNode.Document);

        // The first (and may be the only) run can contain text before the match, 
        // in this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node. 
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        if (isStart)
        {
            Run run = (Run)runs[0];
            run.ParentNode.InsertBefore(new BookmarkStart(run.Document, bmname + i), run);
            i++;
        }
        else
        {
            Run run = (Run)runs[runs.Count - 1];
            run.ParentNode.InsertAfter(new BookmarkEnd(run.Document, bmname + i), run);
            i++;
        }

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    /// <summary>
    /// Splits text of the specified run into two runs.
    /// Inserts the new run just after the specified run.
    /// </summary>
    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}

sheyin · September 24, 2020, 6:20am

Hi Tahir,
Capture.PNG (118.2 KB)

I am getting index out of range exception in Splitrun function

System.ArgumentOutOfRangeException: ‘startIndex cannot be larger than length of string.
Parameter name: startIndex’

sheyin · September 24, 2020, 6:40am

Thahir Could u please help me on this ?

sheyin · September 24, 2020, 4:04pm

Team ,

Any update on it ?

tahir.manzoor · September 24, 2020, 5:30pm

@sheyin

Please ZIP and attach your input and expected output Word documents for testing. We will investigate the issue and provide you more information on it.

sheyin · September 24, 2020, 5:54pm

SampleFile.zip (32.2 KB)

Hi Tahir,

Here is the sample file attached with this thread for your reference. You could see few para graphs and lines are enclosed between <<FP_start>> and <<FP_end>> tags (in second page of word document). All the content enclosed inside this tags should be removed (Including Tags). it would be great if you can provide working solution in c sharp for us for us . We are eagerly waiting for your response. Please note the content inside tags can be any thing .It can be multi para with line break or it can be multiple lines with space etc.

Thanks in advance
sheyinSampleOutputFile.zip (31.2 KB)

tahir.manzoor · September 24, 2020, 6:42pm

@sheyin

We have tested the scenario using the latest version of Aspose.Words for .NET 20.9 with your document. We have not faced any issue and exception. So, please use Aspose.Words for .NET 20.9. We have attached the output DOCX with this post for your kind reference. 20.9-output.zip (20.5 KB)

sheyin · September 25, 2020, 7:31am

Hi Thahir,

We are using version 18.6.0.0. Could you please check with this version. I am still getting exception in split function. Is there any dependency with version for this logic. Is there any other way we can achieve this with 18.6.0.0 version.?

Regards
Sheyin CPCapture.PNG (10.9 KB)

tahir.manzoor · September 25, 2020, 3:24pm

@sheyin

Please note that we do not provide support for older released versions of Aspose.Words. Moreover, we do not provide any fixes or patches for old versions of Aspose products either. All fixes and new features are always added into new versions of our products.

We always encourage our customers to use the latest version of Aspose.Words as it contains newly introduced features, enhancements and fixes to the issues that were reported earlier.