Aspose Cells writing multiple worksheets parallelforeach C#, failing on workbook.save

To improve performance of writing workbook with multiple worksheets, we are using parallel each while writing worksheets. We are getting object reference not set to an instance while doing workbook.save method.

We are trying to run this on AWS Lambda which is linux based.

following is our sample code

try
{
stopwatch.Start();
Workbook workbook= new Workbook();

for (int i = 1; i < worksheetCount; i++)
{
workbook.Worksheets.Add();
}

var result = Parallel.ForEach(workbook.Worksheets,
sheet =>
{
CreateWorksheet(rowCount, sheet.Index, ref sheet);
});
stopwatch.Stop();
Thread.Sleep(10000);
Console.WriteLine("Created parallel worksheets : " + stopwatch.Elapsed);

stopwatch.Restart();
using (MemoryStream stream = new MemoryStream())
{
workbook.Save(stream, SaveFormat.Xlsx);
stopwatch.Stop();
LambdaLogger.Log("Saved report to memory stream : " + stopwatch.Elapsed);
stopwatch.Restart();

PutObjectRequest request = new PutObjectRequest
{
BucketName = Environment.GetEnvironmentVariable(“ReportDocumentsBucket”),
Key = filename,
InputStream = stream
};

this.s3Client.PutObjectAsync(request).Wait();
stopwatch.Stop();
LambdaLogger.Log("Report uploaded to S3 bucket, Total Upload Time : " + stopwatch.Elapsed);
}

}
catch (Exception exception2)
{
LambdaLogger.Log(exception2.Message);
LambdaLogger.Log(exception2.StackTrace);
throw;
}

@sidpatel26,
Please note that we do not support manipulating one Workbook(including other data models of it such as Cells, Worksheets, …etc.) in parallel way. Similalrly, we do not have any future plan to provide this feature. Feel free to write us back if you have any other query in this regard.

Same code is working fine on windows platform, only not working on AWS Lambda which runs on AWS AMI 2. Another issue we noticed, formatting of cells e.g. auto fit columns are not working. Do we need to have any linux dependency added as reference to make it work?

@sidpatel26,
We have also observed that it works in Windows however it fails in other OS like macOS. This issue is logged for AWS Lambda in our database for further investigation. We will write back here once any update is ready for sharing.

This issue is logged as:
CELLSNETCORE-276 - Aspose Cells writing multiple worksheets parallelforeach C#, failing on workbook.save

As the above code works fine in Windows, we tried to fit columns in Windows-based program and that worked fine. However because it was not working in a non-Windows environment, therefore it was not tested. Please share the runnable sample code, environment details including platform and version of Aspose.Cells that is used for testing this issue.

Runnable sample code: Same as windows code posted in this forum
Environment: AWS Lambda which runs on AWS AMI 2
Aspose.Cells version: 21.9

@sidpatel26,
Thank you for providing more information. We have logged it with the ticket for detailed analysis.

Additional information following is failing
sheet.AutoFitColumns(); (see attached error)MicrosoftTeams-image.png (41.3 KB)

@sidpatel26,
We have investigated it and would like to update you that we are afraid there is no guarantee for such kind of operation to work fine always with any platform. In fact we think there are possibilities of the same problem even for windows too. The only difference is different chances for different platforms. Anyways, currently we cannot find possible solution to support modifying those data concurrently for a workbook.

AutoFitColumns has nothing to do with concurrent process. Even with single worksheet it does not work on linux platform VS it works on windows. Can you please advise why we are getting type initialization error? Do we need to add any dependencies? If yes then what are those?

@sidpatel26,
Many users have reported a similar error “Type initializer for threw an exception”. Could you please try the following steps and share your feedback as in many cases these steps has resolved the issue.

install libgdiplus:
apt-get update
apt-get install -y libgdiplus
cd /usr/lib && ln -s libgdiplus.so gdiplus.dll

And install libc6-dev:
apt-get install -y --no-install-recommends libc6-dev

Following is the solution to make ASPOSE CELLS works in AWS LAMBDA.

  1. Upload zip from following link as lambda layers
    GitHub - rv-dtomaz/lambda-layer-system-drawing
    Once you download from above link make sure to upload zip from lambda-layer-system-drawing-master.zip/lambda-layer-system-drawing-master/*.zip
  2. Define Lambda environment variable
    Key: FONTCONFIG_PATH
    Value:/opt/lib/etc

FYI: This solution will solve lots of dependency issues with other ASPOSE products on linux platform and AWS Lambda

Hopefully our research will help others!!!

@sidpatel26,
Thank you for sharing this solution. Please feel free to write us back if you have any other query related to Aspose.Cells.

Do have question for you, what should be expected performance for following test case?

Writing 1 workbook with 5 worksheets, each worksheet with 500K rows and 26 columns.
FYI: We need to use import objects as we will be pulling data from DB
We are seeing 4 mins on and avg to perform above taste case using ASPOS CELLS.

Is there any performance improvements you can suggest to us? see attached code we have AsposeFacade.docx (19.7 KB)

AWS Lambda configuration:
6 CPU, 10 GB RAM, AWS LINUX AIM 2

Appreciate any feedback!

@sidpatel26,
Once read from DB, you may try using LightCells for writing large Excel files here. Please give it a try and share your feedback.

The link that you provided do not have sample code. We have been trying to use light cell with external data set but no luck. Please provide working sample in C# or any other language.

@sidpatel26,
Yes, you are right. LightCells cannot directly read from a database. You may opt for your original scheme to fill the worksheets. As you are writing around 25 million rows in 5 worksheets, therefore it seems that 4 minutes time is not too much. There is no other performance improvement hint available in your scenario.

@sidpatel26,
If you cannot find any sample code at the page of our online document, such as Using LightCells API|Documentation, maybe it is because that your network prevents from visiting some websites (such as https://gist.github.com/ where our sample code is). If so, you can find the code in attached file, or you can find some solution to make gist be accessible for your network.

Even the cells data is coming from DB, you still are able to use LightCells to generate the expected spreadsheet file. You just need to fetch data from the DB row by row and then fill into every cell in the implementation of LightCellsDataProvider.

However, from your attached code, it seems you also needs some other operations, such as auto-fitting rows/columns according to cells data. If so, we are afraid LightCells is not suitable for your scenario. With LightCells the cells data will be saved to the resultant file directly in the “Workbook.Save()” process, without being kept in model. So those operations which depend on the cells data cannot work with LightCells mode.

Thanks! do you have sample code which feels DB data leveraging LightCells API? Sample is so much generic and do not give clear picture how to leverage external data source with light cells API.

@sidpatel26,
It seems there is some misunderstanding. Please note that LightCells cannot interact with DB by any means. You should read the data yourself using any C# technology and then fill into cells row by row. We are sorry for any inconvenience caused in this regard.