GetChildNodes returns different nodes when upgrading from 21.1.0 to 22.12.0

Program.zip (1.5 KB)
Hello,

We have recently started an upgrade on our code base from version 21.1.0 to 22.12.0 and have noticed that the GetChilNodes method returns different count of items. Additionally, the Document object within the chilNode hierarchy is different from one version to the next.

I have created a test application currently running on 22.12.0, please see attached .cs file. Note, please run the same file on version 21.1.0 to replicate the behavior.

The gist of the program is the following:

  • Create contentDoc1 a Document with a table containing nested RepeatingSection SDTs. This document is treated as a document with source content.
  • Create assembledDoc a Document containing 2 RichText SDTs within the body of the Document.
  • Insert the contentDoc1 into the RictText SDT within the assembledDoc Document.

Here is the behavior we are seeing when making a call to get the nodes of type structuredDocumentTag like so:
var nodes = assembledTemplate.GetChildNodes(Aspose.Words.NodeType.StructuredDocumentTag, true);

On version 21.1.0

  • Yields a count of 2 nodes.

On version 22.12.0

  • Yields a count of 3 nodes.

Additionally, inspecting the Document object the hierarchy of the final assembled document is also different.

On version 21.1.0

  • Aspose.Words.Document
    • Aspose.Words.Section
      • Aspose.Words.Body
        • Aspose.Words.Tables.Table
          • Aspose.Words.Markup.StructuredDocumentTag (NodeType: ā€˜RepeatingSectionā€™, Title: ā€˜Section2ā€™)
            • Aspose.Words.Tables.Row

On version 22.12.0

  • Aspose.Words.Document
    • Aspose.Words.Section
      • Aspose.Words.Body
        • Aspose.Words.Tables.Table
          • Aspose.Words.Markup.StructuredDocumentTag (NodeType: ā€˜RepeatingSectionā€™, Title: ā€˜Section1ā€™)
            • Aspose.Words.Markup.StructuredDocumentTag (NodeType: ā€˜RepeatingSectionā€™, Title: ā€˜Section2ā€™)
              • Aspose.Words.Tables.Row

Please help me understand what is happening here and why the GetChildNodes call yields different results when upgrading versions.

Thanks!

@amjustin Looks like a bug in the old version of Aspose.Words. Here is what your code does:

  1. Creates a content document with 2 SDTs:
contentDocument1.GetChildNodes(NodeType.StructuredDocumentTag, true).Count; // Returns 2
  1. Creates assembledTemplate with abother 2 SDTs:
assembledTemplate.GetChildNodes(Aspose.Words.NodeType.StructuredDocumentTag, true).Count; // Returns 2
  1. In one of RichText SDTs you insert content of the contentDocument1, so at this point there are 4 SDTs - 2 from assembledTemplate and another 2 from contentDocument1

  2. You remove the SDT where the contentDocument1 content was inserted using sdt.RemoveSelfOnly();. This method removes only outer STD leaving the inner content intact, so at this point there are 3 SDTs.

Thanks @alexey.noskov. Can you be more specific on what the bug on the previous version is?

Is the bug on GetChildNodes method and also on the Tree hierarchy structure of Document?

I need to understand the impact of this change since we depend on GetChildNodes method for core functionality and how we assemble documents.

@amjustin Unfortunately, it is difficult to say what fix caused this changes. As I can see the behavior has been changed after 22.4 version. I have checked documents produced by 22.3 and 22.4 versions and as I can see the output produced by 22.4+ versions is more correct - Section, Section2 and SingleEmptyPlaceHolder SDTs are there in the output document. While with 22.3- versions only Section2 and SingleEmptyPlaceHolder are in the output. It looks like in older versions the nested repeatingSection SDTs was not allowed and Aspose.Words removed outer repeatingSection.

Thanks for the info @alexey.noskov .

Iā€™ve found another issue between these versions that is breaking when introducing a section break within the source file. Using the same code example Iā€™ve shared, Iā€™ve updated it slightly to use a document ā€œTestDocumentā€ that only contains a section break.

This breaks with an ArgumentNull exception on the new version. However, for the new version when using a ā€œPage Breakā€ instead, I donā€™t see the issue. Is this a bug?

See attached files:
Program.zip (1.2 KB)
TestDocument.docx (19.6 KB)

@amjustin This is not a bug. In Aspose.Words document object model structured document tag cannot contain section break. In this case special nodes are used to mark range of structured document tag spanned several section - StructuredDocumentTagRangeStart and StructuredDocumentTagRangeEnd. So after inserting document with a section break into a structured document tag it is converted to pair of StructuredDocumentTagRangeStart and StructuredDocumentTagRangeEnd nodes and original SDT is removed from the document.

1 Like

@alexey.noskov, thank you for the info on this.

I found another discrepancy that I want to ask about that also regards the GetChildNodes call.

I have the following test file that is mostly empty. It contains a paragraph with set with a ā€œHeading 1ā€ style and an empty paragraph below it set with ā€œNormalā€ style: TestClauseWithHeaderStyleSet.docx (28.0 KB)

Iā€™m using some reference code from the Aspose docs on getting nodes with a specific style:

private (int, int) GetStylesCount(Document doc, string style1, string style2, NodeType nodeType = NodeType.Paragraph)
{
    var paragraphsWithStyle1Cnt = 0;
    var paragraphsWithStyle2Cnt = 0;

    var partParagraphs = doc.GetChildNodes(NodeType.Paragraph, true);
    foreach (Paragraph paragraph in partParagraphs)
    {
        if (paragraph.ParagraphFormat.Style.Name == style1)
            paragraphsWithStyle1Cnt++;

        if (paragraph.ParagraphFormat.Style.Name == style2)
            paragraphsWithStyle2Cnt++;
    }
    return (paragraphsWithStyle1Cnt, paragraphsWithStyle2Cnt);
}

Iā€™m using the code from above comments to put together the source document into the destination document.

When I assemble a document then use this GetStylesCount method to get the count on styles from the source document and then from the resulting document like so:

var headingStyleName = "Heading 1";
var normalStyleName = "Normal";

var (sourceDocParagraphsWithHeadingStyleCnt, sourceParagraphsWithNormalStyleCnt) = GetStylesCount(sourceDocument, headingStyleName, normalStyleName);
var (destDocParagraphsWithHeadingStyleCnt, destParagraphsWithNormalStyleCnt) = GetStylesCount(destDocument, headingStyleName, normalStyleName);

Then, Iā€™m comparing that the counts for the source and dest documents are the same.

Now, the issue Iā€™m seeing is that I have recently started using .NET 6.0 from .NET Framework 4.7. When using 4.7, Iā€™m seeing that the counts are the same, but when I switch to .net6 the counts for the source document change for the ā€˜Normalā€™ style. Document is exactly the same, but Iā€™m getting different node counts!

I would appreciate any insight on this as Iā€™m trying to determine if this is an issue on Aspose or another component.

Thanks!

@amjustin I have tested the scenario on my side and in both .NET Framework 4.7 and .NET 6 the returned count of paragraphs is the same. The problem on your side might occur if you are using ImportFormatMode.KeepSourceFormatting. In this case Aspose.Words copies all required styles to the destination document and generates unique style names if needed. So the Normal style might become Normal_1 to keep the original document formatting. Please try using ImportFormatMode.UseDestinationStyles. In this case Aspose.Words uses the destination document styles and copies only new styles, which are not present in the destination document.

If the problem still persist, please create a simple console application that will allow us to reproduce the problem on our side. We will check the issue once again and provide you more information.

PS: You can use LINQ syntax to count paragraphs of the specified style:

List<Paragraph> paragraphs = sourceDocument.GetChildNodes(NodeType.Paragraph, true).Cast<Paragraph>().ToList();
int normalCount = paragraphs.Where(p => p.ParagraphFormat.StyleName == normalStyleName).Count();
int headingCount = paragraphs.Where(p => p.ParagraphFormat.StyleName == headingStyleName).Count();