Order of child nodes in the Word document with footer

sagaofsilence.dev · April 14, 2025, 4:22pm

Hello,

I am using Aspose Words for Java version 24.6.

I am extracting all the tags defined in a Word template. Then, I process them to check if the expressions used in the tags are valid as per model rules.
To do so, I expect that the tags are retrieved in the order they are used.
Some of the tags could also define variables. Variables are defined first and then used.
So, to resolve the variable value, I depend on the order these tags appear in the Word document.
And I found that the order is maintained as long as they are defined in the body and used in the body. But when they are used inside the footer, I get tags in a different order on Windows (local) and Linux machine (dev server).

On a Windows machine, the output is:
{PARAGRAPH=[<<[root.Name]>>, <<var [name = user.Name]>>, <<var [code = root.Code]>>, <<var [footerVar = name + “ ” + code]>>, <<[root.Name] >>, <<[ root.Code]>>, <<[root.Description]>>, <<[footerVar]>>]}

Where as on the linux machine, the output is:
{PARAGRAPH=[<<[footerVar]>>], <<[root.Name]>>, <<var [name = user.Name]>>, <<var [code = root.Code]>>, <<var [footerVar = name + “ ” + code]>>, <<[root.Name] >>, <<[ root.Code]>>, <<[root.Description]>>}

On the UNIX machine, the script tag using the variable appears first in the retrieved tags than its variable declaration.

I understand that there are different types of header/footer (first, primary, even/odd).
The users are likely to use any type of header and/or footer and define variables and then use those variables. The order of the tags is important for me to perform model verification.

Please guide me on how to handle this.
PFA the template.
Var_footer.docx (31 KB)

Logic to extract the tags:

 private void testVariables() {
    final File templateFile =
        new File(
            "path\\to\\input\\word\\templates",
            "Var_footer.docx");
    try (final InputStream inputStream = new FileInputStream(templateFile); ) {

      final Document doc = new Document(inputStream);
      final NodeCollection<?> childNodes = doc.getChildNodes(NodeType.PARAGRAPH, true);
      final Map<String, List<String>> result = new LinkedHashMap<>();
      final ArrayList<String> tagsCollected = new ArrayList<>();
      result.put(NodeType.getName(NodeType.PARAGRAPH), tagsCollected);
      Node node;
      String nodeText;
      String parentNodeType;
      List<String> extractedTags;
      for (int i = 0; i < childNodes.getCount(); i++) {
        node = childNodes.get(i);
        nodeText = StringUtils.trimToEmpty(node.getText());
        parentNodeType = NodeType.getName(node.getParentNode().getNodeType());
        if (nodeText.isEmpty()) {
          continue;
        }
        log.debug("{} inside {}", nodeText, parentNodeType);
        final String inputWithSingleWhitespace =
            StringUtils.normalizeSpace(StringUtils.trimToEmpty(nodeText));
        final Matcher matcher =
            Pattern.compile("(?-s)<<.+?>>", Pattern.DOTALL).matcher(inputWithSingleWhitespace);

        extractedTags = matcher.results().map(MatchResult::group).toList();
        if (CollectionUtils.isEmpty(extractedTags)) {
          continue;
        }
        tagsCollected.addAll(extractedTags);
      }
      log.debug("Node text: {}", result);
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

Professionalize.Discourse · April 14, 2025, 4:23pm

@sagaofsilence.dev

It seems that you are experiencing inconsistencies in the order of tags extracted from a Word document’s footer when running your code on different operating systems (Windows vs. Linux). This can be attributed to how Aspose.Words processes the document structure, particularly with headers and footers.

Understanding Header and Footer Behavior

HeaderFooter Class: In Aspose.Words, headers and footers are treated as separate sections within a document. Each section can have its own header and footer, and they can contain various child nodes, including paragraphs and tables. The order of these nodes is crucial for your tag extraction logic, especially when variables are defined and used within these sections.
Node Collection: When you retrieve child nodes from the document, the order in which they are returned can vary based on the environment. This is particularly true for nodes within headers and footers, which may not be processed in the same sequence across different platforms.

Recommendations to Ensure Consistent Tag Order

To handle the discrepancies in tag order, consider the following approaches:

Explicitly Handle Footer Nodes: Instead of relying solely on the getChildNodes method for paragraphs, you can specifically access the footer nodes and their child nodes. This way, you can ensure that you are processing them in a controlled manner.
Maintain a Tag Declaration Order: Create a mapping or a list that keeps track of the order in which tags are declared. This can help you resolve variable values correctly, regardless of the order in which they are processed.
Debugging and Logging: Add detailed logging to your extraction logic to capture the order of nodes as they are processed. This can help you identify where the discrepancies occur and adjust your logic accordingly.

Example Code Adjustment

Here’s a modified version of your extraction logic that explicitly checks for footer nodes:

private void testVariables() {
    final File templateFile = new File("path\\to\\input\\word\\templates", "Var_footer.docx");
    try (final InputStream inputStream = new FileInputStream(templateFile)) {
        final Document doc = new Document(inputStream);
        final NodeCollection<?> paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
        final NodeCollection<?> footers = doc.getChildNodes(NodeType.HEADER_FOOTER, true);
        
        // Process paragraphs
        processNodes(paragraphs);
        
        // Process footers
        for (Node footer : footers) {
            processNodes(footer.getChildNodes(NodeType.PARAGRAPH, true));
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private void processNodes(NodeCollection<?> nodes) {
    for (Node node : nodes) {
        String nodeText = StringUtils.trimToEmpty(node.getText());
        if (!nodeText.isEmpty()) {
            // Extract tags and maintain order
            // Your existing tag extraction logic here
        }
    }
}

This approach ensures that you are explicitly handling both the main document body and the footer sections, which should help maintain the order of tags as they are defined and used.

If you continue to face issues, consider reaching out to Aspose support for further assistance, as they may provide insights specific to the version you are using or any known issues with cross-platform behavior.

Conclusion

By explicitly managing how you access and process nodes in both the body and footer of your document, you should be able to achieve consistent results across different operating systems.

sagaofsilence.dev · April 14, 2025, 4:28pm

I will try this and let you know. That is an awesome AI-assisted response.

alexey.noskov · April 15, 2025, 2:47am

@sagaofsilence.dev I would suggest you to use DocumentVisitor to iterate the nodes in the document instead of Document.GetChildNodes. Such way you will easily control the visited node location in the document object model using the appropriate visitXXX method.

sagaofsilence.dev · April 16, 2025, 5:25am

Sure. Thank you for the suggestion.