How can i get a range between block linq tags(include the tags)?

Doraemon · February 22, 2024, 11:16am

Hello Team,I have some issue you may be able to helpme with。

1.How can i get a range between block linq tags(include the tags)?
For example,
If the tags are <<foreach ....>> .... <</foreach>> or <<if >>..... <</if>>, i want to get the range between the foreach or if tags(include the tags).

2.How to resolve the nested tags? I just want to get the range between the outermost tags.
For example,
If the tags are <<foreach ....>> .. <<foreach ....>> ..<<if >>.....<</if>>.. <</foreach>>.. <</foreach>>, I just want to get the range between the outermost foreach tags(include the tags)

3.If the tags in a table, I just want to get the table

alexey.noskov · February 22, 2024, 2:05pm

@Doraemon I am afraid, there is no easy way to achieve this, since tags are represents as simple text in the document. So as the first step it is required to find all tags and make them to be represented as a single Run node. Then you can loop though the runs and get the first and the last run with tags. This can be done using code like this:

Document doc = new Document("C:\\Temp\\in.docx");

Pattern tagPattern = Pattern.compile("<<[^>]+>>");
// Replace tag with itself to make it to be represented as a single run node.
FindReplaceOptions opt = new FindReplaceOptions();
opt.setUseSubstitutions(true);
doc.getRange().replace(tagPattern, "$0", opt);

// Loop through the Run nodes in the document
// and get the fist and last run with tags
Run start = null;
Run end = null;
for (Run r : (Iterable<Run>)doc.getChildNodes(NodeType.RUN, true))
{
    if (tagPattern.matcher(r.getText()).find())
    {
        if (start == null)
            start = r;
        end = r;
    }
}

// Here you can extract content between the runs.
// ......................

Once you get start and end run nodes, you can extract content between them as described here:
https://docs.aspose.com/words/java/extract-selected-content-between-nodes/

in this case you can check whether start or end run node has Table node as ancestor. You can achieve this using code like this:

Table table = (Table)start.getAncestor(NodeType.TABLE);
if(table!=null)
{
    // Do something with the table.
}

Doraemon · February 23, 2024, 7:17am

@alexey.noskov
Thank you for your careful explanation. I will try it.

Doraemon · February 24, 2024, 3:03pm

@alexey.noskov
Hi, alexey
What is the meaning of “$0” during replacing?

doc.getRange().replace(tagPattern, "$0", opt);

I tried to use (<<[^>]+>>)|(<<(\\[.*\\])[^>]+>>)|(<<\\[.*\\]>>) to match some tags like <<["<b>Bold</b> and <i>italic</i> text"] -html>>, but there was something wrong. The splitted tags might be wrong.

Here is the input file. After split runs, the tags in table cannot get the parentTable.
template.docx (17.1 KB)

alexey.noskov · February 24, 2024, 8:16pm

@Doraemon When UseSubstitutions option is enabled `“$0” means the whole matched text, i.e.

doc.getRange().replace(tagPattern, "$0", opt);

Replaces the match with itself.

In your case, please try using the following regular expression:

Pattern tagPattern = Pattern.compile("<<.+?>>");

Doraemon · February 25, 2024, 10:17am

@alexey.noskov
Thank you for your help. It works fine