Apose.Cell Multi-Threaded Workbook.Save problem

I am having a problem with a multi-threaded application saving excel workbooks. This problem only occurs on servers with 2 or more CPU’s. If the program is run single threaded or on a server with 1 CPU, I do not have this problem.

Program generates multiple Tabs in an excel workbook on different threads. When all of the threads are done, I run the save command “ExcelFileToSave.Save(strSaveAsFileNameBuild, FileFormatType.Default);”. I intermittently get the follow Exception thrown:

Error: Index was outside the bounds of the array.

Data: System.Collections.ListDictionaryInternal

Target: Void InternalGetReference(Void*, Int32, Int32*)

Stack: at System.Array.InternalGetReference(Void* elemRef, Int32 rank, Int32* pIndices)

at System.Array.SetValue(Object value, Int32 index)

at System.Collections.Hashtable.CopyKeys(Array array, Int32 arrayIndex)

at System.Collections.Hashtable.KeyCollection.CopyTo(Array array, Int32 arrayIndex)

at Aspose.Cells.Record.#.#(Worksheets #)

at Aspose.Cells.Record.#.#(# #, Worksheets #)

at Aspose.Cells.Worksheets.#(# #)

at Aspose.Cells.Worksheets.#(# #)

at Aspose.Cells.Worksheets.#(# #)

at Aspose.Cells.Worksheets.#(FileFormatType#)

at Aspose.Cells.Workbook.Save(Stream stream, FileFormatType fileFormatType)

at Aspose.Cells.Workbook.Save(String fileName, FileFormatType fileFormatType)

at ICE.Net.Reporting.IC_Report.SaveExcelFile(Workbook& ExcelFileToSave) in C:\Documents and Settings\prjdo000\My Documents\Visual Studio 2005\Projects\IC_Report\IC_Report\IC_Report.cs:line 1090

The problem happens less frequently when I sleep the thread just before issuing the save command “Thread.Sleep(100);”.

The longer the sleep, the less frequent I get this exception thrown.

ExcelFileToSave is a Workbook object. Is it possible that two threads concurrently call one Workbook object’s save method? Please try to lock calling to Save method.

This is not likely. I count the threads I start in my thread pool using an interlock.increment and interlock.decrement commands. I then check and wait for this count to reach zero.

Code works fine on servers with only one processor. Only fails on servers with 2 or more. Also only fails when I have used multiple threads to create each worksheet.

I have also tried putting the save code in a lock (object). This code still fails.

One more note; I only have one active workbook running at a time. I generate multiple threads to create the worksheets but I wait for them to all end before I save. If I am going to generate another workbook (excel file) it is after all other work is done first.

It should not be possible for two threads to call the save method because their is only one thread at a time that is handling the workbook.save command.

There are some global data in a Workbook object. So concurrently process it in different threads may cause problem.

Scenario 1:

1. Create a Workbook object

2. Start multiple threads.

3. Create worksheets in those threads concurrently.

Scenari 2:

1. Start multiple threads

2. In each thread, create a Workbook object. And create worksheets for this Workbook object only in this thread.

For above 2 scenarios, 1 may cause problem and 2 will work fine.

I am assuming that you are saying this is the correct and desired behavior.

I will say that this is very fustrating and will force me to jump thru hoops, only to have to write even more code to combine my worksheets into a single workbook after the fact. But this time, combine them using a single thread.

Are there no plans to correct this problem?

As I mentioned before, a Workbook object contains global data which cannot be processed amony multi-threads concurrently. For example, all strings are saved in a global string pool and there are a global style list to process formattings. These data may be changed in many methods. So it isn't thread safe to process a Workbook object concurrently.

However, processing different Workbook objects in different threads are thread safe.

Here is the workaround that I am using and what my current problem is.

Because I have generated about 200 Worksheets in one Workbook using multiple threads, I can not save this Workbook. If I just save, it sometimes throws an error.

It does not matter that this save command is being issued when I am only running a single (the main) thread.

To work around this issue, I am creating a new instance of Workbook, copying the Workbook that had 200 Worksheets generated using multiple threads, and then saving that new Workbook.

if (blnMultiThread)
{
// Aspose Workbook requires that I make a copy first before saving
Workbook excelTempFile = new Workbook();
excelTempFile.Copy(ExcelFileToSave);
excelTempFile.Save(strSaveAsFileNameBuild, FileFormatType.Default);
}

This works great except for some of my large jobs. My code is bombing on the Workbook.Copy command because I am running out of System memory. My system has 3.5GB of Ram.

Error: Exception of type 'System.OutOfMemoryException' was thrown.
Data: System.Collections.ListDictionaryInternal
Target: Void .ctor(Aspose.Cells.Style)
Stack: at Aspose.Cells.Borders..ctor(Style style)
at Aspose.Cells.Style.Copy(Style style)
at Aspose.Cells.Cells.get_Item(Int32 row, Int32 column)
at Aspose.Cells.Cells.?(Cells ?)
at Aspose.Cells.Worksheet.Copy(Worksheet sourceSheet)
at Aspose.Cells.Worksheets.?(Worksheets ?)
at Aspose.Cells.Workbook.Copy(Workbook source)
at ICE.Net.Reporting.IC_Report.SaveExcelFile(Workbook& ExcelFileToSave) in C:\Documents and Settings\prjdo000\My Documents\Visual Studio 2005\Projects\IC_Report\IC_Report\IC_Report.cs:line 964
at ICE.Net.Reporting.IC_Report.QueueUpWork() in C:\Documents and Settings\prjdo000\My Documents\Visual Studio 2005\Projects\IC_Report\IC_Report\IC_Report.cs:line 710
at ICE.Net.Reporting.IC_Report.StartReport(String commandLine) in C:\Documents and Settings\prjdo000\My Documents\Visual Studio 2005\Projects\IC_Report\IC_Report\IC_Report.cs:line 370

It would be great if I could just save the first Workbook but this sometimes fails due to the limitation of Aspose and multi-threading.
For larger jobs, my workaround will always fail due to lack of memory resources.

Can you recommend a better solution?

It would be great if I could just save the first Workbook but this sometimes fails due to the limitation of Aspose and multi-threading.

Is it possible that when you calling Workbook.Save method in your previous code, there are some thread are not ended to free the resource? As you mentioned that if you call Thread.Sleep before saving, this problem will happen much more less frequently, I think you can create a flag when you create a new thread. When a thread is ended, set this flag to true. In your main thread, you can check all these 200 flags before saving. Only when all of them are set to true, you can call Workbook.Save method. I think this may solves your problem.

I am currently using the Thread Pool to queue up the 200 threads. This allows me to control how many threads are running at a time (10 - 20). I am using interlock on my main thread to increment my counter as to how many threads I have queued up and wait with a while (intThreadcount != 0) and sleep(100). As each thread ends, the pooled threads use interlock to decrement my counter.

When my counter hits zero, I run my save command. I only have a problem with the save command when I run this program on a server with 2 or more CPUs. On my laptop, I never have a problem.

I can send you the entire source code if you would like to review it. You would have to modify it if you what to connect to a local datasource and run it.

Thanks

I will check my code on Monday to see if my code is still doing anything with the worksheets or workbook after I interlock.decrement my counters. How can I make sure that my pooled threads have released themselves from the workbook and worksheet objects prior to saving?

I think you may have to lock Workbook object in your whole program to disable concurrent access.

Would that prevent me from using multiple threads on the worksheet objects for that workbook. I am only referencing the workbook with the main thread. I create the worksheet objects using the main thread and then populate the worksheets on a separate thread after creation. Only worksheet objects are used with the thread pool.

If I cannot, then I might as well be writing a single threaded application. I am typcally creating only one workbook and one excel file at a time. However, those workbooks almost always have many worksheets and that is why I am using multiple threads to create them.

Though you process Worksheet objects in separate threads, those Worksheet objects are referring to the same Workbook object for global data.

As I mentioned before, please access all data in a Workbook object with a single thread. And you can create multi-threads to process different Workbook objects concurrently.