Free Support Forum - aspose.com

Deleting Content between two Nodes

In our document generation which i am refactoring to use Aspose rather than Word directly, conditional sections are created surrounded by {{{ }}}, and need to be removed before the document is presented to the user (these represent IF blocks that have been excluded).

Currently i’m using FindTextRange using a Generic Text finder (using a doc.Range.Replace(new Regex(Regex.Escape(startText) + “(.*?)” + Regex.Escape(endText), RegexOptions.IgnoreCase ), textFinder, false);
which works fine if the tags are all on a single line, and don’t contain embedded tags.

However if you have a structure like this:

{{{ Some Content {{{ More Content }}}
Final Content }}}

The ‘Final Content’ is left behind.

I’ve refactored the finding of the “{{{” and “}}}” to iterate through the paragraphs to find specific matching pairs of tags correctly, but I get problems when trying to remove the content.

Basically when I try and remove the nodes i get an error that there is no parent node.

Is there a good way to remove all the contents between the two tags - the ideal would appear to be using the TextFinder approach, but that only seems to let you change the text so doesn’t help when there is for example a table or image within the tags.

So I have 2 paragraphs and need to remove the 2 paragraphs and everything inbetween.

Any assistance / ideas would be appreciated ?

@Etaardvark,

Thanks for your inquiry. You can replace the tags that are on multi lines. You should use special meta-characters if you need to work with breaks:

  • &p - paragraph break
  • &b - section break
  • &m - page break
  • &l - manual line break

Please refer to the following article.
Replace Text using Meta-Characters

You can also achieve your requirement by using following solution.

  1. Find the start tag and replace it with BookmarkStart node.
  2. Find the end tag and replace it with BookmarkEnd node.
  3. Set the text of bookmark to empty string using Bookmark.Text property.

I’ve changed the Regex slightly so that it expands correctly now across the line breaks.
However I still get a set of Run objects, which I can remove quite happily, using the code below:

        // Runs in the sequence.
        foreach (Node run in foundText)
        {
            Node para = run.ParentNode;
            run.Remove();

            if (para.ToString(SaveFormat.Text).Trim() == string.Empty)
                para.Remove();
        }

However any Table formatting is being left - only the text is being replaced.

Given that the replace will return “Run” objects, and I have to remove it from the containing paragraph, how do i remove any other formatting within the range ?

@Etaardvark,

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach the expected output Word file that shows the desired behavior.
  • Please create a standalone console application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we’ll start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.