Extract/remove document part

I would like to use RegEx to search for a specific part of a document (e.g. everything that is between "@Start@" and "@End@") and remove the part of the document found and insert it into a new document and then save both documents (the original and the new one). .

@Niebelschuetz you can use the approach described in the following article to extract content between nodes:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/

But first it is require to make the tags to be a separate nodes. You can achieve this using find/replace functionality. For example see the following code:

string startTag = "@Start@";
string endTag = "@End@";

Document doc = new Document(@"C:\Temp\in.docx");
FindReplaceOptions opt = new FindReplaceOptions();
opt.UseSubstitutions = true;
doc.Range.Replace(startTag, "$0", opt);
doc.Range.Replace(endTag, "$0", opt);
// Get runs that represent start and end tags.
NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
Run startRun = runs.Cast<Run>().Where(r => r.Text == startTag).First();
Run endRun = runs.Cast<Run>().Where(r => r.Text == endTag).First();
// Extract content between runs.
// .............

Thank you for the quick solution

1 Like