Using DocumentBuikder.insertDocument() the Content Control id is changed

giulio.andolfi · November 16, 2021, 3:53pm

Hello everyone,

I have a problem if I try to import a document into another using DocumentBuilder.insertDocument(): the content control id is changed respect the original one.

Into the document CHILD I have one content control that contains one table. The content control id is -1569802863.

Into the document PARENT I have one content control that contains the document CHILD. The content control id is 1867718377.

Using this method:

builder.insertDocument(child, ImportFormatMode.USE_DESTINATION_STYLES);

The document PARENT contains the document CHILD’s content control (it that contains the table), but its id is changed.

My question is: there are one or more reasons why the content control id change?

In my applkication I use aspose-word 21.07 but I have the same problem using 21.10.

Thanks for collaboration.

Regards,
Giulio

sergey.lobanov · November 16, 2021, 7:07pm

@giulio.andolfi,
We have tested the scenario and were unable to reproduce the same issue at our side. Please ZIP and attach your input child and parent documents and your output document here for testing.We will investigate the issue and provide you information on it.

giulio.andolfi · November 17, 2021, 11:27am

Hello Sergey,
thanks for your time.

I attached the zip file with the documents:

PARENT.docx
CHILD.docx
OUTPUT_OK.docx
OUTPUT_NOT_OK.docx
PARENT_PRE_INSERT_DOCUMENT.docx

The problem is random, not sistematic.
For example the first time, the insertDocument method works fine, but the next times often returns content control’s id chenged.

In the OUTPUT_NOT_OK file the content control id is 1253651801 instead of -1569802863.

I want explain you our process:

We manipulate the PARENT.docx file modifing the document.xml file. Particularly we move the conternt of the content control outside it. (this is a workaround that we must use to solve this problem: Aspose deletes the content control’s content if there is at least one section break into the conte control). See the file PARENT_PRE_INSERT_DOCUMENT.docx.
We open PARENT.docx using Aspose Words
We remove all that is between the content control and the END PLACEHOLDER.
We call the insertDocument passing the CHILD.docx file
We save document
We manipulate again the PARENT.docx file modifing the document.xml file to move into the content of content all that is between the content control and the END PLACEHOLDER.

I hope that these information help you in your work.
Thanks

Giulio

aspose.zip (103.3 KB)

alexey.noskov · November 18, 2021, 12:20pm

@giulio.andolfi Could you please also provide sample code that will allow us to reproduce the problem. Simple appending or insertion your document into the parent document does not allow us to reproduce the problem.
Though, the problem might occur because the same document is appended several times. It is not allowed that several structured document tags have the same id. If such situation occurs Aspose.Words assigns new id. Please see StricturedDocumentTag.getId() remarks for more information.

giulio.andolfi · November 18, 2021, 5:08pm

Hello @alexey.noskov ,
tomorrow I will upload a simple project where I replicated the scenario.
Thanks,
Giulio

giulio.andolfi · November 19, 2021, 7:50am

Hello @alexey.noskov,
This is a little project where I have reproduced the issue.
Into the project there is the TestImportDocument class, if you run the test, you reprooduce the issue.
Thanks for you time
Let me know, please.
Giulio

TestForAspose.zip (102.1 KB)

alexey.noskov · November 19, 2021, 12:36pm

@giulio.andolfi The problem occurs because both main and part documents has SDTs with the same id -1569802863, as I mentioned earlier it is not allowed. You can check structured document tag ids using simple code like the following:

Document md = new Document("C:\\Temp\\MD_MANIPOLATO.docx");
PrintIds(md);
System.out.println("===========================================");

Document part = new Document("C:\\Temp\\PART.docx");
PrintIds(part);

static void PrintIds(Document doc)
{
    Iterable<StructuredDocumentTag> test = (Iterable<StructuredDocumentTag>)doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true);
    for (StructuredDocumentTag tag : test ) {
        System.out.println(tag.getId());
    }
}

The output of this code is:

1867718377
-1569802863
-394892188
-226769620
-842235119
2118169901
===========================================
-1569802863

As you can see id -1569802863 is in both documents.

giulio.andolfi · November 19, 2021, 1:08pm

Hello @alexey.noskov,
I know that both docuemnt have the same id -1569802863.
But before call the method builder.insertDocument(…), we remove the SDT with id -1569802863. You can see it into the attached project.
Are we doing something wrong with removing SDT?
Is it necessary save or do anything else before call the method builder.insertDocument(…)?
Thanks,
Giulio

alexey.noskov · November 19, 2021, 1:30pm

@giulio.andolfi Even if you remove structured document tag from the document it’s id is still considered as used and cannot be used by other structured document tag.

giulio.andolfi · November 19, 2021, 2:57pm

@alexey.noskov, is there any way to ensure that deleted SDT ids are no longer considered?
We need to remove the SDTs and insert them again with the same ids.
Thanks,
Giulio

alexey.noskov · November 20, 2021, 4:42am

@giulio.andolfi Unfortunately, there is straightforward way to used the same SDT id again. I have logged an issue WORDSNET-23152 to allow using ids of removed SDTs again. We will let you know once it is resolved.
In meantime, you can use a workaround - simply clone document right after deleting SDT and use the cloned document. For example the following code gives an expected result:

Document md = new Document("C:\\Temp\\MD_MANIPOLATO.docx");
PrintIds(md);
System.out.println("===========================================");

Document part = new Document("C:\\Temp\\PART.docx");
PrintIds(part);
System.out.println("===========================================");

md.getChild(NodeType.STRUCTURED_DOCUMENT_TAG, 1, true).remove();
Document mdClone = (Document)md.deepClone(true);

mdClone.appendDocument(part, ImportFormatMode.USE_DESTINATION_STYLES);
PrintIds(mdClone);
System.out.println("===========================================");

giulio.andolfi · November 22, 2021, 3:57pm

Hello @alexey.noskov,
thanks for opened the issue. Could you conferm me that the fix will be released on the java version of Aspose Words?
IIn meantime I will try to apply your workaround.
Let you know as soon as possible.
Thanks
Giulio

alexey.noskov · November 23, 2021, 3:55am

@giulio.andolfi Sure, once fix is done it will be included in both .NET and Java versions of Aspose.Words.

giulio.andolfi · November 24, 2021, 1:53pm

Hello @alexey.noskov,
I have applied you workaround to the lettle project that I was attach and works fine.
As soon as possible I will try to apply it to the our solution.
Sorry but I have two questions:

Could the document deep clone affect performance if the document is several MB?
This behaviour was introduce since a specific version or there is present since the beginning?

Thanks,
Giulio

alexey.noskov · November 24, 2021, 5:41pm

@giulio.andolfi

Yes, this might have slight performance effect. However, cloning is performed in memory and is quite fast operation.

This behavior is present since the beginning.

giulio.andolfi · December 2, 2021, 8:27am

Hello @alexey.noskov,
unfortunately we can not apply the workaround to the our code not only for the performance but for our structure.
Do you have any idea when the fix will be released?
Thanks,
Giulio

alexey.noskov · December 2, 2021, 10:53am

@giulio.andolfi Unfortunately, currently it is difficult to provide you an estimate. The issue is currently is in a que for analysis. Once it is analyzed by our developers we will provide you more information.

giulio.andolfi · January 7, 2022, 9:33am

Hello @alexey.noskov,
do you have some news reguarding this issue?
Thanks,
Giulio

alexey.noskov · January 7, 2022, 11:33am

@giulio.andolfi Unfortunately, the issue is still unresolved. It is not so simple to change the current behavior because of possible regressions in some scenarios. For example, when the removed node may be returned to the document tree and we will get two SDT’s with the same value. We will further investigate the issue, but seems the current behavior works better for most scenarios than the behavior when id of the removed SDT is considered as unused.

giulio.andolfi · January 7, 2022, 2:51pm

Hello @alexey.noskov,
I am confident for a solution.
I look forward to receiving your feedback.
Thank you,
Giulio