Hi Team,
Need best ways to handle large files using Aspose cells. I have seen the documentation but could not get it working as such. I have the below requirements and need suggestion to utilize aspose for files beyond 300 MB and upto 1 GB
- I have a huge excel workbook, need to update only one of the worksheet in the workbook and upload back the entire excel to sharepoint. I am often facing out of memory exception at somepoint while initializing the workbook or saving the workbook either through memory stream or file stream
2.The Initialize workbook takes a longer time and is unable to be used in case of azure functions for files beyong 40 MB.
- Memorypreferences while saving or loading the workbook but does not show any difference
- Is there any streaming supported, while reading or writing so the process can be completed successfully?
- Is there an option to merge the sheet without loading the complete file
6.Is there a recommendation on minimum CPU/Memory stats for different file sizes?
Currently using 24.7 version
Thanks
@Mveerabaghu
Can you please provide more details about the specific errors you are encountering, such as the exact out of memory exception message and the context in which it occurs?
this is when we save workbook
Exception Info:
EXCEPTION: Exception of type ‘System.OutOfMemoryException’ was thrown.
STACK TRACE: at Aspose.Cells.Workbook.Save(Stream stream, SaveOptions saveOptions)
at Aspose.Cells.Workbook.Save(Stream stream, SaveFormat saveFormat)
at …
@Mveerabaghu,
Handling large Excel files efficiently with Aspose.Cells can be challenging, particularly when dealing with files ranging from 300 MB to 1 GB. For such a large data set and the resultant Excel file, a significant amount of memory will be required. Unfortunately, there is limited potential for significant improvement in memory usage for this scenario, as the data set necessitates a large amount of memory, approximately 10 times or more the file’s size. If possible, it is recommended to consider using LightCells APIs to manipulate such files.
@Mveerabaghu
We are afraid currently there is no better solution for loading and changing so large template files. We have had a plan to solve such kind of issue by caching cells data in temporary file instead of keeping all data in memory. Currently we plan to support it before the end of this year.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): CELLSNET-54534
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
ok thanks for the update.
Meantime we are trying implement LightCellsAPI.
With the sample code/the explanation put in the documentation, we could only insert dummy data and unable to use the Data read using Read API.
We would like to know if there is a way where we can copy the data from the sheets as well write data to one of the sheets using LightCellsAPI. Any documentation with better examples can help us understand the implementation
@Mveerabaghu,
The examples provided in the Using LightCells API document are generic and for reference purposes, you may write your code accordingly. Moreover, even using LightCells API might not make big difference for your scenario/case. Anyways, you may try it until we implement caching cells data in the temporary file instead.
Could you please share a standalone console application (source code without compilation errors) with resource file(s) to demonstrate how you are implementing the LightCells API? Please zip the project prior attaching. We will check your sample soon.
@Mveerabaghu
Because you need to update the specific sheet and then resave the workbook to get the whole excel file with all worksheets(including those sheets without modification), we are afraid LightCells cannot help to achieve this goal. LightCells is mainly designed for streaming read from/write to excel file without keeping all cells data in memory. Because you need to re-save the workbook, all data of the workbook need to be kept in memory, so LightCells cannot decrease the memory cost for your scenario when keeping all data in memory.
If the large dataset has been split into many sheets in your template file and loading one sheet only every time does not require large amount of memory, LigthCells may be one possible solution:
- Load the template file with LoadFilter to load the sheet which you need to update only.
- Update the data in the loaded sheet
- Save the workbook with LightCells( Interface LightCellsDataProvider | Aspose.Cells for .NET API Reference). In the implementation of StartSheet, load the specific sheet only with LoadFilter from the template file, and then use the cells data in the newly loaded sheet to fill the objects to be saved(provided by StartRow and StartCell).
I am having an Excel of 1 GB , and using a dedicated Azure Function which has capacity to scale upto 14 GB memory and the function does the below operations
- Download the stream - takes approx 30 sec
- Initialize the workbook — bottleneck
- Update 1 worksheet
- Calculate workbook
- Get stream from workbook – bottleneck
- Upload to sharepoint.
the function goes unreachable starting point 3 and the max time out we can have is 10 minutes. As i understand available memory is not an issue in this case, still i m not able to accomplish on updating the workbook.
Could you share an approximate benchmark on file size Vs the CPU utilization and approximate time taken. this will help us to plan the infrastructure better, as at this point we dont not have any solution. is there a documentation around it?
Regarding the Issue ID(s): CELLSNET-54534, could we have some more details on what to expect in terms of
- maximum file size it would support
- exception handling around out of memory scenarios
- required infrastructure on this
- support for readonly workbooks or updates to selective sheets in an efficient way…
@Mveerabaghu
Thank you for your feedback. We will further investigate your issue. Once there is an update, we will notify you promptly.
hi team,
wanted to check if we have any updates on this issue as we are approaching end of year. Issue ID(s): CELLSNET-54534.
could you also share some details around this.
@Mveerabaghu,
We apologize, but the requested feature/issue has not been implemented/resolved yet. We will check if we can provide you with an estimated time of completion. We will get back to you soon.
@Mveerabaghu,
We plan to start this task in next month. We will try to implement the feature before the end of December 2024 (next month), provided there are no other higher priority or urgent tasks that may impede its progress.
ok thanks for the update! Keep us posted.
@Mveerabaghu,
Alright, we will provide you with new updates as soon as they become available.