Large Scale Documents

I am combining documents with the following code:

For Each srcSection As Section In srcdoc
   Dim dstSection As Node = dstdoc.ImportNode(srcSection, True, ImportFormatMode.KeepSourceFormatting)
   dstdoc.AppendChild(dstSection)
Next srcSection

If I get into combining hundreds of documents into one, the Dim dstSection as Node line slows down very much. Is there any way to speed this up? Is there anything Aspose is doing that might be able to be dropped, such as tracking undos, etc?
Also, when a large document (500 pages or more) is created, when it is opened, it has a tough time repaginating. I create the same size document in Word, and it seems that Word tracks information inside a document about the pagination. But when I open the large document created from Aspose, I can sit there for 2 minutes until the system finally repaginates. Are there any properties or methods in the Aspose code that could help this process?
I just purchased the product to use on a large scale for the insurance industry, and I need it to be robust enough to handle these scenarios.
Thank You,
Derek Hart

> 500 pages are pretty big documents. Most of the tests we do are with smaller documents. Probably the biggest we tested for were around 100 pages.
My recommendations are:

  1. Maybe you can use several, but smaller documents in your solution instead of one huge document.
  2. Try using ImportFormatMode.UseDestinationStyles instead of KeepSourceFormatting.

KeepSourceFormatting has to do a bit more work. If you have styles between documents consistent (for example Heading 1 looks the same in all documents), then the UseDestinationStyles is better for you.
There is no undo being tracked by Aspose.Words, so nothing we can optimize here. I also have no ideas about pagination in MS Word on open at this stage. I noticed too, that some documents take longer than others to open in MS Word for no apparent reason, not just the ones created with Aspose.Words.
If you zip all those several hundreds of documents and attach to this post, we will be happy to do a performance test here and a round of optimization. I’m sure there is a way we can get them to work faster.

Thank you for looking at the documents if I send them. First I wanted to ask a couple more questions. Perhaps I could combine a specific number of documents at once (maybe 20), and store them in an array. And then combine the groups of 20 at the end. Do you have any example of this? Also, when I am combining documents, I posted another message that was asking about creating a new document in Aspose, and it always inserts a continuous page break at the top when I insert my first document, which throws the first page off, and does not keep the formatting. Do you have any ideas on how to fix this? Should I delete the section break at the top in Aspose code, and how would I do this? Do you think this idea of combining documents is the way to go?

Derek

When you create an empty document using new Document(), the document is not really empty, but has one section already. Therefore when you append your first document, it is actually appended after the first section of the “empty” document. If you document section starts from new page you therefore get the first page blank. To avoid that you just need to delete the first section of the empty document. You probably can do that before or after appending other documents.

Document doc = new Document();
doc.Sections.RemoveAt(0);
// go on to append the documents now.

I don’t think combining documents into arrays and then comibining into a single large document will make any difference.

I have logged the ImportNode perfomance problem to our defect base as issue #1032. We will try to fix it in the next release, which will be out in about two weeks from now.
Best regards,

Sorry we are going to do no work on this issue for the upcoming release. This remains on our task list for the future.
As I said, combining many documents using KeepSourceFormatting option is hard on Aspose.Words. It creates too many styles, it is probably the main problem. Try using ImportFormatMode.UseDestinationStyles instead.

I’m happy to tell this has been addressed in Aspose.Words 4.0.
If you want to append one document to another multiple times, you should use the NodeImporter class. It works as a “context” when importing and does not import styles and lists for every iteration, but imports then only once. If this suits you, the improvement in speed is more than dramatic.
A test without NodeImporter class shows append of 500 documents slows down to 18 documents per second. This test with NodeImporter shows 480 documents appended PER SECOND. 1000 documents appended shows 477 documents appended per second. The speed of appending almost does not depend on the number of times you combine the documents.

[Test]
public void TestDefect1032()
{
    Document dstDoc = new Document();
    long n = 500;
    double speed;
    DateTime time0 = DateTime.Now;
    Document srcDoc = TestUtil.Open(@"Defects\TestDefect1032.doc");
    NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KeepSourceFormatting);
    for (int i = 0; i < n; i++)
        AppendDoc(dstDoc, srcDoc, importer);
    TimeSpan span = DateTime.Now.Subtract(time0);
    // Average speed of combining documents (documents per second).
    speed = Math.Round((double)n / span.TotalSeconds, 2);
    Console.WriteLine(speed);
    TestUtil.Save(dstDoc, @"Defects\TestDefect1032 Out.doc");
}
private void AppendDoc(Document dstDoc, Document srcDoc, NodeImporter importer)
{
    for (int i = 0; i < srcDoc.Sections.Count; i++)
    {
        Section srcSection = srcDoc.Sections;
        Section dstSection = (Section)importer.ImportNode(srcSection, true);
        dstDoc.Sections.Add(dstSection);
    }
}