Document is corrupted after removing Table nodes and saving it to MemoryStream using .NET

Bug. When removing table note from MemoryStream, MemoryStream will become corrupted.

Document.GetChildNodes(NodeType.Table, true);
Then do: childNode.Remove()
Save MemoryStream
do that again on different table
eventually: “The document appears to be corrupted and cannot be loaded”

at xe8730a664ff488a4.x990d54f34b2b5118.x64da0b7f32fdd92e(xef4b7685c2495ff2 x2fe8f0c48ba61aee)
at xe8730a664ff488a4.xef4b7685c2495ff2.x7716fc67b6df3585(xef4b7685c2495ff2 x2fe8f0c48ba61aee)
at xe8730a664ff488a4.xef4b7685c2495ff2.x9f6a941ab884bab8(xef4b7685c2495ff2 x2fe8f0c48ba61aee)
at xe8730a664ff488a4.xef4b7685c2495ff2.x06b0e25aa6ad68a9(Stream xb4b4757248084bd0, TextWriter x36a868b32d278eb7, Encoding xff3edc9aa5f0523b, EventHandler xaa5395229058f5c6)
at xe8730a664ff488a4.xef4b7685c2495ff2.x06b0e25aa6ad68a9(Stream xb4b4757248084bd0, TextWriter x36a868b32d278eb7, Encoding xff3edc9aa5f0523b)
at xe8730a664ff488a4.xef4b7685c2495ff2.x06b0e25aa6ad68a9(Stream xb4b4757248084bd0)
at xf9a9481c3f63a419.xc5d5cabda4535c40…ctor(Stream stream)
at xf989f31a236ff98c.x755940550ade8e52.x59a8246b79e928e3()
at xf989f31a236ff98c.x53dc82a419732f24.x49c42236a5e1a9a3()
at xf989f31a236ff98c.x53dc82a419732f24.xdef7f68a22ec051d(Stream xcf18e5243f8d5fd3)
at Aspose.Words.Document.x5d4db34d48fb3129(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)

ZipEntry::ReadDirEntry(): Bad signature (0x00000000) at position 0x000046E7

Just when trying to create a new document from MemoryStream.

And depending on docx loaded, sometimes ended up with error missing XML root

When using Aspose 10.5 the same error used to be when manipulating merge fields coming from Document.MailMerge.GetFieldNames();

It’s gone when using the latest Aspose Words 11.4. But now we have problem with removing table node. And unfortunately both manipulating merge field and table need to be supported in our code.

Note, we are saving Docx as OoxmlCompliance.Iso29500_2008_Transitional, but the behaviour is the same if saved with SaveFormat.Docx.

This bug is only affecting DOCX format and not DOC. Our code have been running correctly with DOC documents.
I have attached sample of code, unit test, and docx document. You’ll see that the same doc document will execute just fine.

Appreciate your immediate response as we need to get our code released ASAP.

Hi Hairani,

Thanks for your query. It seems that you want to delete tables from document which contains some specific merge field. I have modified the code of CustomDeleteTableExecute and Execute method. Please use the following code snippet for your requirement.

Hope this helps you. Please let us know if you have any more queries.

public void CustomDeleteTableExecute(Document generatedDocument, string tableName)

{

NodeCollection templateTables = generatedDocument.GetChildNodes(NodeType.Table, true);

string tableText;

bool tableDeleted = false;

foreach (Node table in templateTables)

{

Table childTable = table as Table;

tableText = childTable.GetText();

if (tableText.Contains("MERGEFIELD " + tableName))

{

childTable.Remove();

tableDeleted = true;

break;

}

}

}

byte[] bytes = File.ReadAllBytes(MyDir + "Bug_For_Aspose.docx");

int numBytesToRead = bytes.Length;

Document document = null;

using (MemoryStream documentStream = new MemoryStream(bytes))

{

document = new Document(documentStream);

}

using (MemoryStream documentStream = new MemoryStream())

{

CustomDeleteTableExecute(document, "BARClauseTable");

CustomDeleteTableExecute(document, "BARTable");

switch (document.OriginalLoadFormat)

{

case LoadFormat.Doc:

default:

document.Save(documentStream, SaveFormat.Doc);

break;

case LoadFormat.Docx:

OoxmlSaveOptions ooSave = new OoxmlSaveOptions();

ooSave.Compliance = OoxmlCompliance.Iso29500_2008_Transitional;

document.Save(documentStream, ooSave);

break;

}

documentStream.Seek(0, SeekOrigin.Begin);

document = new Document(documentStream);

} // using memorystream

Hi,
I apologise for not mentioning earlier, but I have tried that solution you propose. So I know it will work.
Unfortunately, we can’t really change the parameter from MemoryStream to Document.

The reason is we are implementing interface with MemoryStream signature. That to cut dependency with Aspose only to one module, and enable other kind of implementation in different modules. The actual code will span over several modules, not just one.

The code works well with DOC, and the only change we made to enable DOCX is simply to add switch on format to save as DOC or DOCX.

Please advise the workaround for MemoryStream.

Hi Hairani,

Thanks for sharing the information. I have done some changes in your code related to return type. Please see the highlighted sections in following code snippet. Hope this helps you.

Please let us know if you have any more queries.

byte[] bytes = File.ReadAllBytes(MyDir + "Bug_For_Aspose.docx");

int numBytesToRead = bytes.Length;

Document document = null;

using (MemoryStream documentStream = new MemoryStream(bytes))

{

document = new Document(documentStream);

}

using (MemoryStream documentStream = new MemoryStream())

{

switch (document.OriginalLoadFormat)

{

case LoadFormat.Doc:

default:

document.Save(documentStream, SaveFormat.Doc);

break;

case LoadFormat.Docx:

OoxmlSaveOptions ooSave = new OoxmlSaveOptions();

ooSave.Compliance = OoxmlCompliance.Iso29500_2008_Transitional;

document.Save(documentStream, ooSave);

break;

}

MemoryStream documentStream2 = new MemoryStream();

documentStream2 = CustomDeleteTableExecute(documentStream, "BARClauseTable");

documentStream2 = CustomDeleteTableExecute(documentStream2, "BARTable");

documentStream2.Seek(0, SeekOrigin.Begin);

document = new Document(documentStream2);

//document.Save(MyDir + "AsposeOut.docx");

} // using memorystream

public MemoryStream CustomDeleteTableExecute(MemoryStream documentStream, string tableName)

{

documentStream.Seek(0, SeekOrigin.Begin);

Document generatedDocument = new Document(documentStream);

NodeCollection templateTables = generatedDocument.GetChildNodes(NodeType.Table, true);

string tableText;

bool tableDeleted = false;

foreach (Node table in templateTables)

{

Table childTable = table as Table;

tableText = childTable.GetText();

if (tableText.Contains("MERGEFIELD " + tableName))

{

childTable.Remove();

tableDeleted = true;

break;

}

}

MemoryStream OutPutStream = new MemoryStream();

switch (generatedDocument.OriginalLoadFormat)

{

case LoadFormat.Doc:

default:

generatedDocument.Save(OutPutStream, SaveFormat.Doc);

break;

case LoadFormat.Docx:

OoxmlSaveOptions ooSave = new OoxmlSaveOptions();

ooSave.Compliance = OoxmlCompliance.Iso29500_2008_Transitional;

generatedDocument.Save(OutPutStream, ooSave);

break;

}

return OutPutStream;

}

It still changes the method signature really… Instead of void it returns MemoryStream.



But, one thing that I learn. MemoryStream is a reference type, need to be called as ref, otherwise after method call will not change the MemoryStream.

I am going to test the code and make some change. Will update this with the result. Thanks for your help so far.

Hi Hairani,

Thanks for your feedback. Yes, in that case, you should pass MemoryStream as reference type.

Please let us know if you have any more queries.

Not really working with ref, I’m afraid!

Ended up with Unknown file format

at Aspose.Words.Document.x5d4db34d48fb3129(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)
at Aspose.Words.Document.x5d95f5f98c940295(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)
at Aspose.Words.Document…ctor(Stream stream, LoadOptions loadOptions)
at Aspose.Words.Document…ctor(Stream stream)

Please ignore my last post, still looking

Lesson learned. When passing MemoryStream to be manipulated with Aspose Document, then it is not changed after call. It must be returned explicitly, and not even with ref.

We can close this

Hi Hairani,

Thanks for your feedback. Please let us know if you have any more queries. We are always glad to help you.