Out of Memory Exception - System.IO.MemoryStream

When loading a very big Excel file (using the MemoryPreference setting, it is > 100MB) I get the following message:
System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at  .(Stream , Stream )

There is enough memory on the system, but probably not enough contiguous memory. Microsoft has solved this problem by using Microsoft.IO.RecyclableMemoryStreamManager. See this article: Announcing Microsoft.IO.RecycableMemoryStream | Philosophical Geek and the github repository for it: GitHub - microsoft/Microsoft.IO.RecyclableMemoryStream: A library to provide pooling for .NET MemoryStream objects to improve application performance.

Aspose is critical for us but it fails to load large files. Using the Microsoft Memory Stream instead of System.IO.MemoryStream drastically reduces the potential of running into OutOfMemory exception.

Can you please use it or something like it to allow us to large big files with Aspose?

@tjenkins,

Thanks for providing us some details.

Well, Aspose.Cells would generally need 10 times or more memory of the size of the file (although it is not final) when loading/saving the Excel file, so you got to make sure that sufficient memory is available for the big process. Could you provide us your template Excel file (>100MB), you may upload to some third party drive services and share the download link here, we will check it soon.

Thank you.

Sorry, I cannot provide the file since it has customer data. I can see if I can create a file manually to re-create the problem.

Please make sure you read the blog article that I referenced. Your response did not indicate that you actually understood the issue.

Let me try to make it clear:
You may require 1 GB of memory for 100 MB Excel file (10x as you pointed out). The system does have 5 GB of free RAM available. BUT IT STILL FAIILS with an out of memory exception. The reason is not that there is not enough memory available (we have 4 times more than needed), it is that we do not have 1 GB of CONTIGUOUS memory available, since memory is fragmented and it may have only at most 600 MB of contiguous memory available. This is exactly what I observed on my machine.

System.IO.MemoryStream requires CONTIGUOUS memory, the library, which I reference above from Microsoft does not.

I hope this clarifies this issue and would appreciate addressing this issue since we cannot use the Aspose.Cells product until this is fixed.

@tjenkins,

Thanks for providing further details.

Well, we need to investigate your issue in details. We need to evaluate if we could implement microsoft memorystream or not. Alternatively, we will also check if we could enhance existing APIs to handle huge sized files to cope with the issue. I have logged a ticket with an id “CELLSNET-45461” for your issue. Our product team will look into it soon.

Once we have an update on it, or we have some other way around, we will share it with you here.

Thank you.

Any update? We have stopped using Aspose.Cells because of this, it is a showstopper bug.

@tjenkins,

Which file format are you reading and saving? It’s better if you could provide us a simple project to investigate your issue precisely. We will look into whether we can reduce memory usage.

Thank you.

OK, I have a reproducible example for you.
Copy the file (download from Dropbox https://www.dropbox.com/s/abl58fy6qrv1waq/large.7z?dl=0 ) to c:\temp and extract it (Download). It’s only 13 MB but expands to 1.3 GB once extracted.

Then call the method below with Test(@“c:\temp\large.xlsx”);

It should succeed. Then comment out the Recycable memory implementation, use the MemoryStream implementation instead and see it fail and the same with Aspose.Cells, which uses ). If you have it succeeding, just reduce the available memory somewhat.

public static void Test(string path)
{
//this fails. MemoryStream is used by Aspose.Cells
//var ms = new MemoryStream();

        //this succeeds, PLEASE use this inside Aspose.Cells
        var mm = new RecyclableMemoryStreamManager();
        RecyclableMemoryStream ms = new RecyclableMemoryStream(mm);

        using (var stream = LongFile.OpenRead(path))
        {
            stream.CopyTo(ms);
        }

        //this fails:
        //var asposeOptions = new Aspose.Cells.LoadOptions { MemorySetting = MemorySetting.MemoryPreference };
        //var d = new Workbook(path, asposeOptions);
        //Console.WriteLine(d.AbsolutePath);

    }

Please change the internal implementation to use Recyclable Memory Stream.

This blocks our usage of Aspose.Cells.

Additional note:
Recycable Memory Stream is subclass of Stream so it should be pretty easy to replace MemoryStream with it.
I had the same kind of problem in our product and this resolved it.

@tjenkins,

Thanks for providing us template Excel file in the zipped archive and sample code.

I have logged it against your existing issue “CELLSNET-45461” into our database for product team’ investigations. Our concerned developer from product team will look into it soon.

Once we have any new information, we will share it with you here.

Thank you.

@tjenkins,

Please try using LightCells API to read and write data for large files.
Aspose.Cells provides the LightCells APIs which is mainly designed to write or read cell’s data one by one without building a complete data model block into memory. See the document for your reference:

Thank you.

Thank you, I tried it and it does not work.

See the updated example (I am using the LightCellsDataHandlerVisitCells from the example in the link you provided):
public static void Test(string path)
{
//this fails. MemoryStream is used by Aspose.Cells
//var ms = new MemoryStream();

        //this succeeds, PLEASE use this inside Aspose.Cells
        //var mm = new RecyclableMemoryStreamManager();
        //RecyclableMemoryStream ms = new RecyclableMemoryStream(mm);
        //
        //using (var stream = LongFile.OpenRead(path))
        //{
        //    stream.CopyTo(ms);
        //}

        //this fails:
        //var asposeOptions = new Aspose.Cells.LoadOptions { MemorySetting = MemorySetting.MemoryPreference };
        //var d = new Workbook(path, asposeOptions);
        //Console.WriteLine(d.AbsolutePath);

        //this fails:
        var asposeOptions = new Aspose.Cells.LoadOptions { MemorySetting = MemorySetting.MemoryPreference };
        LightCellsDataHandlerVisitCells v = new LightCellsDataHandlerVisitCells();
        asposeOptions.LightCellsDataHandler = v;
        var d = new Workbook(path, asposeOptions);
        Console.WriteLine(d.AbsolutePath);

    }

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.
at System.IO.MemoryStream…ctor(Int32 capacity)
at .(Stream )
at Aspose.Cells.Workbook.(Stream )
at .(Object )
at .(MethodBase , Boolean )

I really need a fix where you stop using System.IO.MemoryStream.

@tjenkins,

Thanks for providing us further details.

We are sorry that LightCells APIs does not work for your scenario/case. I have logged your comments/concerns against your issue “CELLSNET-45461” into our database. Our concerned developer from product team will look into it further and we will get back to you soon.

Thank you.

@tjenkins,

This is to inform you that we have fixed your issue (logged earlier as “CELLSNET-45461”) now. We will soon provide you the fix/version with sample code after performing QA and incorporating other enhancements and fixes.

Once the fix is available for public use, we will share it here.

Thank you.

@tjenkins

Please download and try the following fix. Let us know your feedback.

It has fixed your issue CELLSNET-45461 now.

Please try the fix with the following sample code.

C#

CellsHelper.CustomImplementationFactory = new MM();

LightCellsDataHandlerVisitCells v = new LightCellsDataHandlerVisitCells();

wb = new Workbook(pathCase, new LoadOptions() { MemorySetting = MemorySetting.MemoryPreference, LightCellsDataHandler = v });

Console.WriteLine("Total sheets: " + wb.Worksheets.Count + ", cells: " + v.CellCount + ", strings: " + v.StringCount + ", formulas: " + v.FormulaCount);

class MM : CustomImplementationFactory
{

	RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();

	public override MemoryStream CreateMemoryStream()
	{

		return manager.GetStream("MM");
	}

	public override MemoryStream CreateMemoryStream(int capacity)
	{
		return manager.GetStream("MM", capacity);
	}
}

With new APIs provided by the fix, user is now able to provide custom implementation such as using Recycable memory implementation instead of the default MemoryStream.

I have implemented the code you provided; however, I am getting an exception when the Workbook object is instantiated:

at System.Collections.ArrayList.get_Item(Int32 index)
at .(UInt32 , Int32 , Int32 , Boolean )
at .(UInt32 , Int32 , Int32 , Boolean )
at .( , )
at .( , )
at .( , )
at Aspose.Cells.Workbook.(Stream , LoadOptions , Boolean )
at Aspose.Cells.Workbook…ctor(Stream stream, LoadOptions loadOptions)
at LinkTek.LinkFixer.FileHandlers.XlsFileHandler.LoadDocument(Stream asposeStream, Filespec filespec) in C:\Data\Axiom\Release\LinkFixer\LinkFixerCommon\FileHandlers\XlsFileHandler.cs:line 114
at LinkTek.LinkFixer.FileHandlers.OfficeFileHandlerBase1.LoadFile(MemoryStream asposeStream, Filespec filespec, LoadOptions options, FileAccess fileAccess, AppError& error) in C:\Data\Axiom\Release\LinkFixer\LinkFixerCommon\FileHandlers\OfficeFileHandlerBase.cs:line 134 at LinkTek.LinkFixer.FileHandlers.OfficeFileHandlerBase1.LoadFile(Filespec filespec, LoadOptions options, FileAccess fileAccess, AppError& error) in C:\Data\Axiom\Release\LinkFixer\LinkFixerCommon\FileHandlers\OfficeFileHandlerBase.cs:line 110
at LinkTek.LinkFixer.FileHandlers.BaseFileHandler.LoadFileWithRetry(Filespec file, LoadOptions loadOptions, FileAccess access, AppError& error) in C:\Data\Axiom\Release\LinkFixer\LinkFixerCommon\FileHandlers\BaseFileHandler.cs:line 703
at LinkTek.LinkFixer.FileHandlers.BaseFileHandler.ProcessFile(LinkFixerFileInfo fileInfo, DataLayerManager dataLayer) in C:\Data\Axiom\Release\LinkFixer\LinkFixerCommon\FileHandlers\BaseFileHandler.cs:line 643

Here is the file that I am using to test with:
https://www.dropbox.com/s/oje28freihgbht3/Test_Mail_dot_To.xls?dl=1

@steve.debusschere

There is some problem with your Excel file and it looks corrupt. Microsoft Excel also throws out of memory message when we open it. Please see the following screenshot for a reference.

Yes, I get the same, but was not sure if it was corrupt. I will try using another file.

Do I have to implement the LightCellsDataHandler in order for this fix to work? …or is it sufficient to set the CustomImplementationFactory only (see below)?

       CellsHelper.CustomImplementationFactory = MemoryStreamMemoryManager.Instance;

        var asposeOptions = new Aspose.Cells.LoadOptions
        {
            MemorySetting = MemorySetting.MemoryPreference,
        };

Does the LightCells API allow access to non-cell data (e.g. this.Doc.DataConnections or this.Doc.Worksheets.ExternalLinks)?

@steve.debusschere

You only need to set the CustomImplementationFactory for your initial issue.

Of course, using LightCells apis correctly will decrease the memory cost efficiently too. With the LightCells, user should not keep most of cells in memory after process them one by one (ProcessCell()), so it is not fit for the scenarios that require cells data later, such as formula calculation. But other data models such as ExternalLinks can be accessed after the workbook be loaded with LightCells.