Multiple Inserts Create Un-useable Document

Hey Roman,

Thought I was on to something with the last post about multiple ope/saves, but it turns out it was just part of my problem.

I re-worked the code to read the list of documents that need to be assembled from a database, then do the assembly all at once and save the assembled file (code below). The coding works fine, and produces a document with all the sections in the right place that can be opened by Word 2003. However, if you re-save (SaveAs) or try to modify or print the document, Word crashes.

In an effort to determine if it was one of my 27 source files that was causing the problem, I limited the assembly process to a few documents at a time, but in each case, the crashes occurred. So, either all of my source documents are “corrupt,” or I am doing something wrong.

I did notice a peculiar thing… when attempting to make a master file (“Report.doc”) from the five smaller files I will send you by email, the Aspose-created Report.doc was 20kB – smaller than the blank sourcefile (“Report.doc”) from which it was created. When I open the assembled file in Word, it manages to write 39k before it crashes – to illustrate:

Original: Report.doc – 23kb (blank Word 2003 document)

Addfile1: locmap.doc – 24kb (one page Word 2003 document)
Addfile2: sitedescription.doc – 40kb (one page Word 2003 document)
Addfile3: improvementdescription.doc – 26kb (one page Word 2003 document)
Addfile4: platmap1.doc – 24kb (one page Word 2003 document)
Addfile5: sketch1.doc – 24kb (one page Word 2003 document)

Result: Report.doc – 20kb (five page Word 2003 document)

Open Report.doc in Word 2003 (OK)
Print Report.doc in Word 2003 (Crash)
SaveAs Report.doc > Test.doc (writes Test.doc – 39kb five pages)

Test.doc can then be opened, but alos crashes on print or save.

I will email you the blank, the five addfiles, and the results…

Thanks, Roman!!

Shannon

(code snippet follows)

Document destDoc = word.Open("c:...\\" + Request.Params["fileno"] + destName);

DataTable myTable = ExecuteDataTable("doclist", "SELECT \* FROM doclist");

for (int i = 0; i < myTable.Rows.Count; i++)
{
    DataRow row = myTable.Rows[i];
    string srcnm = Convert.ToString(row["docname"]);
    Document srcDoc = word.Open("c:...\\masterdocs\\" + srcnm);
    while (srcDoc.Sections.Count > 0)
    {
        Section section = srcDoc.Sections[0];
        srcDoc.Sections.RemoveAt(0);
        destDoc.Sections.Add(section);
    }
}

destDoc.Save("c:..." + destName, SaveFormat.FormatDocument);

As a follow-up, I tried to go about it backwords, and it works. Instead of building the document from scratch, I built a master document in Word that contained all parts, then used Sections.Remove to pull out the parts I didn’t want for a particular assembly.

While this is a sufficient work-around, I would really like to try to solve the crash problem in the assembly, as it is much better for this application to be able to build the document from scratch than to delete un-needed sections… the “build” approach allows much more dynamic interaction with sections, and adding or deleting sections is not a problem, but with the “teardown” approach, there would have to be several different master documents, and any changes to a master would require deployment to all masters for consistency and re-determining the section count of all masters to make sure the right pages are being removed.

The master document was assembled in Word by opening a blank file, then opening each sub-file, copying, and pasting, with a “Next page” break added for each. Thus, the five files I sent in the earlier email should have the same data in them, indicating that it is not in the information in the files themselves, but in Aspose that makes the crash when the files are assembled (??).

I am hoping that this information assists in the determintation of why the assembly might be causing results to crash.

Thanks,

Shannon

(code snippet follows)

Document destDoc = word.Open("c:..." + destname + destName);

DataTable myTable = ExecuteDataTable("remover", "SELECT \* FROM remover order by secno desc");

for (int i = 0; i < myTable.Rows.Count; i++)
{
    DataRow row = myTable.Rows[i];
    int srcsec = Convert.ToInt16(row["secno"]);
    Section section = destDoc.Sections[srcsec];
    destDoc.Sections.Remove(section);
}

destDoc.Save("c:...report.doc", saveformat, SaveFormat.FormatDocument);

Thanks for the report and the files, we will check the issue out.

Hey Roman –

It’s been about a week since I started threads related to this problem, so I figured I’d check in to see how the progress was coming, or if you have found any reason why my assembled files are causing Word to crash…

Thanks,

Shannon

I’ve tested this and I confirm, I can get MS Word to crash. I do it slightly differently, but I’m sure it’s the same issue.

Basically, I can tell that in the current version of Aspose.Word it is better to follow the “delete unneeded sections” approach. There are technical reasons that make this approach better than assembling a document from several documents. These problems crop up only under certain circumstances and we will be working to improve Aspose.Word in this area.

The main problem is that some formatting, such as paragraph, list and table styles are not part of a section, but part of a document and when moving a section from one document to another - we don’t have a good solution to move the styles into the new document as well. At best, this results in some formatting such as list formatting being lost, but at worst (not yet fully understood why) is your case causing MS Word to crash.

In your case, I think it has to do with table styles. Note that your tables use Table Grid style. This style is probably not defined in the blank document. If I add only one document with a table, the destination document does not seem to have any problems saving and reopening in MS Word. Visually, formatting is not lost, although I can see that Table Grid Style is no longer applied. The problem appears when I add two or more tables from the source documents.

You can either stick with “delete sections” approach or try to avoid using styles (including table styles) in the source documents.

More testing confirms that this certainly caused Table Grid style missing from the destination document.

What I did to produce a document that does not cause the crash:

  1. Open not a blank document, but a document that contains one table (similar to the ones you have in other documents).
  2. Add all other documents to the end.
  3. You can delete the first section of the original document now.
  4. Save.

This document can now be opened and saved and opened in MS Word without problems.

TestUtil.SetUnlimitedLicense();
Document dstDoc = TestUtil.Open(@"Other\ShannonCombineDocs\src0.doc");
for (int i = 1; i <= 4; i++)
{
    Document srcDoc = TestUtil.Open(string.Format(@"Other\ShannonCombineDocs\src{0}.doc", i));
    Section section = srcDoc.Sections[0];
    srcDoc.Sections.RemoveAt(0);
    dstDoc.Sections.Add(section);
}
dstDoc.Sections.RemoveAt(0);
TestUtil.Save(dstDoc, @"Other\ShannonCombineDocs\TestShannonCombineDocs Out.doc");

List formatting is no longer lost, please get latest Aspose.Word 2.0.4. for more info see https://downloads.aspose.com/words