How can i get a range between block linq tags(include the tags)?

Hello Team,I have some issue you may be able to helpme with。

1.How can i get a range between block linq tags(include the tags)?
For example,
If the tags are <<foreach ....>> .... <</foreach>> or <<if >>..... <</if>>, i want to get the range between the foreach or if tags(include the tags).

2.How to resolve the nested tags? I just want to get the range between the outermost tags.
For example,
If the tags are <<foreach ....>> .. <<foreach ....>> ..<<if >>.....<</if>>.. <</foreach>>.. <</foreach>>, I just want to get the range between the outermost foreach tags(include the tags)

3.If the tags in a table, I just want to get the table

@Doraemon I am afraid, there is no easy way to achieve this, since tags are represents as simple text in the document. So as the first step it is required to find all tags and make them to be represented as a single Run node. Then you can loop though the runs and get the first and the last run with tags. This can be done using code like this:

Document doc = new Document("C:\\Temp\\in.docx");

Pattern tagPattern = Pattern.compile("<<[^>]+>>");
// Replace tag with itself to make it to be represented as a single run node.
FindReplaceOptions opt = new FindReplaceOptions();
opt.setUseSubstitutions(true);
doc.getRange().replace(tagPattern, "$0", opt);

// Loop through the Run nodes in the document
// and get the fist and last run with tags
Run start = null;
Run end = null;
for (Run r : (Iterable<Run>)doc.getChildNodes(NodeType.RUN, true))
{
    if (tagPattern.matcher(r.getText()).find())
    {
        if (start == null)
            start = r;
        end = r;
    }
}

// Here you can extract content between the runs.
// ......................

Once you get start and end run nodes, you can extract content between them as described here:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/

in this case you can check whether start or end run node has Table node as ancestor. You can achieve this using code like this:

Table table = (Table)start.getAncestor(NodeType.TABLE);
if(table!=null)
{
    // Do something with the table.
}

@alexey.noskov
Thank you for your careful explanation. I will try it.

1 Like

@alexey.noskov
Hi, alexey
What is the meaning of “$0” during replacing?

doc.getRange().replace(tagPattern, "$0", opt);

I tried to use (<<[^>]+>>)|(<<(\\[.*\\])[^>]+>>)|(<<\\[.*\\]>>) to match some tags like <<["<b>Bold</b> and <i>italic</i> text"] -html>>, but there was something wrong. The splitted tags might be wrong.

Here is the input file. After split runs, the tags in table cannot get the parentTable.
template.docx (17.1 KB)

@Doraemon When UseSubstitutions option is enabled `“$0” means the whole matched text, i.e.

doc.getRange().replace(tagPattern, "$0", opt);

Replaces the match with itself.

In your case, please try using the following regular expression:

Pattern tagPattern = Pattern.compile("<<.+?>>");

@alexey.noskov
Thank you for your help. It works fine

1 Like