Weird behaviour of removal of a StructuredDocumentTag with a section break

acturisaspose · January 15, 2025, 12:55pm

Hi,

I have an issue with Aspose.Words, could you please take a look?

My input file contains 3 StructuredDocumentTags, the first one has a section break in it.
I run the following code on .Net framework 4.7.2, using Aspose.Words 25.1 (although I can also replicate on earlier versions) with the licence applied.

var document = new Document(@"D:\input.docx");
var collection = document.Range.StructuredDocumentTags;
foreach (var sdt in document.Range.StructuredDocumentTags)
{
    switch (sdt)
    {
        case StructuredDocumentTagRangeStart sdtRangeStart:
            sdtRangeStart.RemoveAllChildren();
            sdtRangeStart.RemoveSelfOnly();
            break;
        case StructuredDocumentTag sdtTag:
            sdtTag.RemoveAllChildren();
            sdtTag.RemoveSelfOnly();
            break;
    }
}
if (collection != document.Range.StructuredDocumentTags)
{
    Console.WriteLine("Collections don't match");
}
document.Save("D:/output.docx");

I would expect all SDTs to be removed. However, it looks like the removal of the first one completely replaces document.Range.StructuredDocumentTags and the program falls out of the loop too early.

Please note that it only happens when the SDT is the first in the document. With a paragraph added before it, it works as expected.

input.docx (17.5 KB)

output.docx (14.9 KB)

Professionalize.Discourse · January 15, 2025, 12:56pm

@acturisaspose

It seems that you are encountering an issue with the removal of Structured Document Tags (SDTs) in Aspose.Words when the first SDT in the document contains a section break. The behavior you described, where the removal of the first SDT causes the loop to exit prematurely, is likely due to the modification of the collection while iterating over it.

When you remove an SDT from the document, the document.Range.StructuredDocumentTags collection is updated, which can lead to unexpected behavior in your loop. Specifically, if the first SDT is removed, the collection is altered, and the loop may not process the remaining SDTs as expected.

To avoid this issue, you can create a separate list of the SDTs to remove, and then iterate over that list to perform the removals. Here’s an example of how you can modify your code:

var document = new Document(@"D:\input.docx");
var collection = document.Range.StructuredDocumentTags.ToList(); // Create a separate list

foreach (var sdt in collection)
{
    switch (sdt)
    {
        case StructuredDocumentTagRangeStart sdtRangeStart:
            sdtRangeStart.RemoveAllChildren();
            sdtRangeStart.RemoveSelfOnly();
            break;
        case StructuredDocumentTag sdtTag:
            sdtTag.RemoveAllChildren();
            sdtTag.RemoveSelfOnly();
            break;
    }
}

if (collection.Count != document.Range.StructuredDocumentTags.Count)
{
    Console.WriteLine("Collections don't match");
}

document.Save("D:/output.docx");

In this modified code, we first create a list of the SDTs to be processed. This way, even if the collection changes during the loop, we are iterating over a static list, which prevents the loop from exiting prematurely.

This approach should help you achieve the desired behavior of removing all SDTs without encountering the issue you described. If you continue to experience problems, please provide additional details about the document structure or any specific errors you encounter.

Sources:

alexey.noskov · January 15, 2025, 1:11pm

@acturisaspose Please try using the following code to remove SDTs from the document:

Document doc = new Document(@"C:\Temp\in.docx");

foreach (IStructuredDocumentTag sdt in doc.Range.StructuredDocumentTags)
{
    sdt.GetChildNodes(NodeType.Any, false).Clear();
    sdt.RemoveSelfOnly();
}

doc.Save(@"C:\Temp\out.docx");

acturisaspose · January 15, 2025, 2:12pm

@alexey.noskov It worked. Thanks for the suggestion and for your quick reply!