Using DocumentBuikder.insertDocument() the Content Control id is changed

Hello everyone,

I have a problem if I try to import a document into another using DocumentBuilder.insertDocument(): the content control id is changed respect the original one.

Into the document CHILD I have one content control that contains one table. The content control id is -1569802863.

Into the document PARENT I have one content control that contains the document CHILD. The content control id is 1867718377.

Using this method:

builder.insertDocument(child, ImportFormatMode.USE_DESTINATION_STYLES);

The document PARENT contains the document CHILD’s content control (it that contains the table), but its id is changed.

My question is: there are one or more reasons why the content control id change?

In my applkication I use aspose-word 21.07 but I have the same problem using 21.10.

Thanks for collaboration.

Regards,
Giulio

@giulio.andolfi,
We have tested the scenario and were unable to reproduce the same issue at our side. Please ZIP and attach your input child and parent documents and your output document here for testing.We will investigate the issue and provide you information on it.

Hello Sergey,
thanks for your time.

I attached the zip file with the documents:

  • PARENT.docx
  • CHILD.docx
  • OUTPUT_OK.docx
  • OUTPUT_NOT_OK.docx
  • PARENT_PRE_INSERT_DOCUMENT.docx

The problem is random, not sistematic.
For example the first time, the insertDocument method works fine, but the next times often returns content control’s id chenged.

In the OUTPUT_NOT_OK file the content control id is 1253651801 instead of -1569802863.

I want explain you our process:

  1. We manipulate the PARENT.docx file modifing the document.xml file. Particularly we move the conternt of the content control outside it. (this is a workaround that we must use to solve this problem: Aspose deletes the content control’s content if there is at least one section break into the conte control). See the file PARENT_PRE_INSERT_DOCUMENT.docx.
  2. We open PARENT.docx using Aspose Words
  3. We remove all that is between the content control and the END PLACEHOLDER.
  4. We call the insertDocument passing the CHILD.docx file
  5. We save document
  6. We manipulate again the PARENT.docx file modifing the document.xml file to move into the content of content all that is between the content control and the END PLACEHOLDER.

I hope that these information help you in your work.
Thanks

Giulio

aspose.zip (103.3 KB)

@giulio.andolfi Could you please also provide sample code that will allow us to reproduce the problem. Simple appending or insertion your document into the parent document does not allow us to reproduce the problem.
Though, the problem might occur because the same document is appended several times. It is not allowed that several structured document tags have the same id. If such situation occurs Aspose.Words assigns new id. Please see StricturedDocumentTag.getId() remarks for more information.

Hello @alexey.noskov ,
tomorrow I will upload a simple project where I replicated the scenario.
Thanks,
Giulio

Hello @alexey.noskov,
This is a little project where I have reproduced the issue.
Into the project there is the TestImportDocument class, if you run the test, you reprooduce the issue.
Thanks for you time
Let me know, please.
Giulio

TestForAspose.zip (102.1 KB)

@giulio.andolfi The problem occurs because both main and part documents has SDTs with the same id -1569802863, as I mentioned earlier it is not allowed. You can check structured document tag ids using simple code like the following:

Document md = new Document("C:\\Temp\\MD_MANIPOLATO.docx");
PrintIds(md);
System.out.println("===========================================");

Document part = new Document("C:\\Temp\\PART.docx");
PrintIds(part);
static void PrintIds(Document doc)
{
    Iterable<StructuredDocumentTag> test = (Iterable<StructuredDocumentTag>)doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true);
    for (StructuredDocumentTag tag : test ) {
        System.out.println(tag.getId());
    }
}

The output of this code is:

1867718377
-1569802863
-394892188
-226769620
-842235119
2118169901
===========================================
-1569802863

As you can see id -1569802863 is in both documents.

Hello @alexey.noskov,
I know that both docuemnt have the same id -1569802863.
But before call the method builder.insertDocument(…), we remove the SDT with id -1569802863. You can see it into the attached project.
Are we doing something wrong with removing SDT?
Is it necessary save or do anything else before call the method builder.insertDocument(…)?
Thanks,
Giulio

@giulio.andolfi Even if you remove structured document tag from the document it’s id is still considered as used and cannot be used by other structured document tag.

@alexey.noskov, is there any way to ensure that deleted SDT ids are no longer considered?
We need to remove the SDTs and insert them again with the same ids.
Thanks,
Giulio

@giulio.andolfi Unfortunately, there is straightforward way to used the same SDT id again. I have logged an issue WORDSNET-23152 to allow using ids of removed SDTs again. We will let you know once it is resolved.
In meantime, you can use a workaround - simply clone document right after deleting SDT and use the cloned document. For example the following code gives an expected result:

Document md = new Document("C:\\Temp\\MD_MANIPOLATO.docx");
PrintIds(md);
System.out.println("===========================================");

Document part = new Document("C:\\Temp\\PART.docx");
PrintIds(part);
System.out.println("===========================================");

md.getChild(NodeType.STRUCTURED_DOCUMENT_TAG, 1, true).remove();
Document mdClone = (Document)md.deepClone(true);

mdClone.appendDocument(part, ImportFormatMode.USE_DESTINATION_STYLES);
PrintIds(mdClone);
System.out.println("===========================================");

Hello @alexey.noskov,
thanks for opened the issue. Could you conferm me that the fix will be released on the java version of Aspose Words?
IIn meantime I will try to apply your workaround.
Let you know as soon as possible.
Thanks
Giulio

@giulio.andolfi Sure, once fix is done it will be included in both .NET and Java versions of Aspose.Words.

Hello @alexey.noskov,
I have applied you workaround to the lettle project that I was attach and works fine.
As soon as possible I will try to apply it to the our solution.
Sorry but I have two questions:

  • Could the document deep clone affect performance if the document is several MB?
  • This behaviour was introduce since a specific version or there is present since the beginning?

Thanks,
Giulio

@giulio.andolfi

Yes, this might have slight performance effect. However, cloning is performed in memory and is quite fast operation.

This behavior is present since the beginning.

Hello @alexey.noskov,
unfortunately we can not apply the workaround to the our code not only for the performance but for our structure.
Do you have any idea when the fix will be released?
Thanks,
Giulio

@giulio.andolfi Unfortunately, currently it is difficult to provide you an estimate. The issue is currently is in a que for analysis. Once it is analyzed by our developers we will provide you more information.

Hello @alexey.noskov,
do you have some news reguarding this issue?
Thanks,
Giulio

@giulio.andolfi Unfortunately, the issue is still unresolved. It is not so simple to change the current behavior because of possible regressions in some scenarios. For example, when the removed node may be returned to the document tree and we will get two SDT’s with the same value. We will further investigate the issue, but seems the current behavior works better for most scenarios than the behavior when id of the removed SDT is considered as unused.

Hello @alexey.noskov,
I am confident for a solution.
I look forward to receiving your feedback.
Thank you,
Giulio