Aspose corrupting a document for Word 2003

I have a process which takes a file, updates its variables, then attempts to save the file. The process continues by opening the file in MSWord and automating the file in different ways. I understand that I can use Aspose to fully automate the file, and we will eventually get to that, but as Aspose is still new to us, we’re taking baby steps.
Aspose has worked well for us in most cases but this case has thrown me. Aspose seems to be messing up a file just enough that we in comes to automating the file after Aspose touches it, the file is somehow different enough that the Word.Automation cannot work properly. This is only an issue for DOC files in Word 2003. DocX files? No problem (but not an option as per our clients). This is also not an issue in Word 2007 or 2010.
I have been able to duplicate the file from a blank document, but there has to be very specific items in the file in order for the problem to occur.
To reproduce the problem, create the following file in Word 2003 (or open the attached file):

  1. Open a new file

  2. Create a two row, one column table.

    1. In the first row,
      1. Put some image. The image content does not seem to matter.
      2. Right click on the image and add a hyperlink to www.google.com. The URL doesn’t seem to matter.
    2. In the second row,
      1. Change the shading to orange (this was important in what was our original problem file. I don’t know if it’s important here, but I left it in.
      2. Add some text. The text doesn’t seem to matter.
      3. Select the text and put a style Heading 1 around it.
  3. Go to the end of the document.

  4. Insert a footer.

    1. In the footer, add a table with the following:
      1. 2 columns, 1 row.
      2. In column 1, add a document variable pointing to Version (which doesn’t exist).
      3. In column 2, add a reference to the text added in the body of the document (Insert | Reference | Cross Reference with a reference type of Heading).
  5. Go out of the footer.

  6. In the body of the document, delete the contents of the second row. Now your cross reference will point to nothing.

  7. Save this document. The attached file, Doc w Table and Footer.doc, was created using the above steps.

  8. Using Aspose, open the file, do nothing but save it as a new name in the .doc format. Doc w Table and Footer after Aspose Saved.doc is the name of the resulting file after Aspose saved the document.

The code used to generate the file named in step 8 is:

Aspose.Words.Document doc = new Aspose.Words.Document(docName);
doc.Save(newDocName, Aspose.Words.SaveFormat.Doc);

Results: Something is wrong with the file. When we go through our steps to modify the file, the file cannot be saved in MSWord 2003 as a .doc file because the tables are corrupt. If we bypass Aspose and do not let it touch the file, the file saves properly. We are working on our end to see what we might be doing wrong that is not robust enough to handle issues like this, but please advise what I might need to do to save the file differently using Aspose, or what the problem might be that this happens. Thoughts are that something might not work right when Aspose tries to save Document Variable that don’t have content, or if Aspose attempts to save a document containing references that no longer exist. We are using Aspose.Words v 9.7. Looking at the two files in the Document Explorer, the two files start quite differently.
Thank you. I’d be happy to furnish any other details that may help you. Have a nice day!

Hello.
Thank you for your request.
Can you please specify error attempting saving the document. And please tell me what MS Office 2003 Service Pack is installed on the computer.

I tried to reproduce your problem on MS Office 2003 SP3 (Virtual Machine under the OS: Windows XP Pro SP3). Unfortunately for any manipulation of the file I could not reproduce your problem. I tried to save file without changes, after made changes, save in another file - no error occurred. Everything worked normally. Please tell us the error in more detail. how can I simulate the problem?

The error is that it says that the tables have become corrupted and the document cannot be saved.
The machine I’m working on uses Windows Vista Business. I am using MS Word 2003 SP3(11.8328.8329).
The automation we perform involves opening the file, then saving it to a temporary location so we can monitor changes. If we listen to MSWord’s OnBeforeSave event, (we’re using an ActiveX control with COM automation for this) it seems to us as though the save event is happening on a different thread and that thread is hanging. MSWord may not visually show that there is a problem, but we are not getting the OnBeforeSave event, and Word shows a blank document. If we disable the event listener, the save appears to happen but we believe something wrong is happening.
Bottom line, there are several potential points of failure in this problem. However, one of the the keys to the problem happening is that Aspose saves this document. If you take Aspose saving the document out of the loop, then none of the described problems occur.
Looking at the two files in the Document Explorer, why are such differences between the two? Why is the filesize so much different, if all you are doing is opening the file and saving it under a new name?
Thanks for your insight and help on this.

Alright, we finally have steps you can use that should duplicate the problem that doesn’t involve our end at all.
Take the two originally attached files, Doc w Table and Footer.doc and Doc w Table and Footer after Aspose Saved.doc and rename them to docBefore.doc and docAfter.doc, respectively. This will make the next bit easier to do.

  1. Save them on your computer to d:\ or wherever you want. In IE (we are using IE8), in the address bar, type file:///d:\docBefore.doc
  2. You will see a two row table. put some text before the image in the first cell, then put some text after the table in the body of the document.
  3. Choose File Save as in word (alt+f) and save to your desktop (location doesn’t matter).
  4. Close out of the browser. It will say that the file is modified and would you like to save. This has something to work Word and IE and seems to be a known issue.

Repeat the above for the second file. For me, the entire document is dissapearing when I save.
Note that when the file is opened, it needs to open up inside of the browser. If it opens from disk, then it seems to work OK.
Thanks for your help on this.

Hi,
Thank you for your additional information.
This is reminiscent of a shamanic dance around the campfire.
But it’s really going on. I managed to reproduce your problem. I register a bug in our defects database. Once it is fixed, you will automatically notified.

Hi there. We’ve just run across another customer having this type of document corruption, so it’s not a once-in-a-lifetime occurrance. We now have 3 documents: 2 from customers and the bare-bones version of a file that can duplicate the problem.
Keep me posted, thank you!

Hello,
Thanks for the additional information. We’ll keep your problem in mind. Once developers will analyze the issue, we will immediately notify you.

Here is an interesting tidbit that may be related.
I have a document of size X that I open as a MemoryStream. If I read that document in using Aspose and make the filesize smaller, and then I save the document, if I save to a file, the document size is (size X - {some amount Y}). However, if I save the document to the memorystream defined above, the length of the stream (X) is not getting updated to the new(X-Y) size. This introduces some corruption in my MemoryStream when I write that into my database.
On a side note, I am interested in what Aspose does to a file when it saves it. I start with a blank document in Word 2003 and add a header and save that file to disk. Having Aspose look at the file in your sample document viewer program, it shows one header node. If I have Aspose open and save the file under a new name, and then I look at the file, there are SEVERAL header nodes. Are these needed somehow, and what else is Aspose modifying in the file when it saves?
Have a nice day!

Hello,
Thank you for additional information.
Could you please attach your source and out documents. I look at them close.

Hi James,
Thank you for additional information. The problem with corrupted data in database might be caused by GetBuffer method. It adds extra empty bytes at the end of the array. The solution is very simple. Just use ToArray method instead of GetBuffer. Please see the following code:

// Return the array of bytes
return toMemoryStream.ToArray();

Also, the following article can be useful:
https://docs.aspose.com/words/net/serialize-and-work-with-a-document-in-a-database/
Hope this helps.
Regarding an extra Header/Footers. As you may know each section of MS Word document can contain 3 headers and three footers of different types. When you create one of them in DOC file, MS Word creates other as well, but all of them are empty. That is why you do not see them in the document, but they are still there.
Best regards,