I’m writing monstrous spreadsheets that are 200K x 350 columns wide. I need to write 4mm rows over say 20 workbooks. I’m worried about memory. I was able to to create most of them using the standard aspose functions, but it took a while and by the 20th spreadsheet my laptop froze.
Cause
The LightCellsDataProvider
is available in Aspose.Cells for Python via .NET. This class is specifically designed to handle large spreadsheet files efficiently, making it ideal for your use case of creating extensive spreadsheets without overwhelming system resources.
Solution
Using LightCellsDataProvider
allows you to save large spreadsheets in a lightweight manner, significantly reducing memory usage. It streams data directly to the file, which helps prevent your laptop from freezing when processing large amounts of data.
Code
Here’s a basic example of how to implement LightCellsDataProvider
:
from aspose.cells import Workbook, LightCellsDataProvider
class MyDataProvider(LightCellsDataProvider):
def start_sheet(self, sheet_index):
# Logic to start saving a worksheet
return True
def next_row(self):
# Logic to get the next row index
return row_index # Replace with actual row index
def start_row(self, row):
# Logic to start saving a row
pass
def next_cell(self):
# Logic to get the next cell index
return cell_index # Replace with actual cell index
def start_cell(self, cell):
# Logic to start saving a cell
pass
# Create a workbook and save using LightCellsDataProvider
workbook = Workbook()
provider = MyDataProvider()
workbook.save("output.xlsx", provider)
Next Steps
- Implement the
MyDataProvider
class according to your data structure. - Call the
save
method on your workbook instance, passing your custom data provider. - Test with a smaller dataset first to ensure everything works as expected before scaling up to your full dataset.
If you have any further questions or need additional assistance, feel free to ask!
Adding to this, I’m using examples like the 1 provided by the robot but get an error: TypeError: type ‘aspose.cells.LightCellsDataProvider’ is not an acceptable base type
Which co-pilot says: because Aspose.Cells for Python via .NET does not support subclassing LightCellsDataProvider in Python . This is a limitation of the .NET-to-Python bridge: you cannot inherit from .NET abstract classes/interfaces in Python.
So I would just like to get a confirmation that it is in fact available, and the tiniest of a working examples would be appreciated
LightCells API is not available in Aspose.Cells for Python via .NET. Maybe you could try using MemoryPreference mode as you are concerned with memory usage for your processes (reading big files and writing large data), see the document: Optimizing Memory Usage while Working with Big Files having Large Datasets|Documentation
Thank you! I will give that a try and see what happens and report back to this thread.
Alright, please take your time to try to use MemoryPreference mode if it makes any difference. By the way, we have classes: LightCellsDataHandler class | Aspose.Cells for Python via .NET API References and LightCellsDataProvider class | Aspose.Cells for Python via .NET API References. We will evaluate your requirements and may devise a sample code snippet (example) for you (if possible).
Not being a python expert, I’ve not used classes yet , hence a small stand alone example that simply uses LightCells to write a few hard wired cells with a few hard wired styles (yellow background ?) would be awesome. But I’ll try the memory setting first…
Also if I can integrate LightCells, I’m interested in the speed difference too.
Thanks Again!!!
@SteveOlson
We’re very sorry, but since LightCellsDataHandler and LightCellsDataProvider are interfaces in .NET, we currently do not support inheriting from interfaces like this:
class MyDataProvider(LightCellsDataProvider):
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): CELLSPYTHONNET-304
Please use the MemoryPreference approach instead.
Thank you for the update. I will start integrating MemoryPreference settings.
The memory and compression options did seem to reduce the size of the spreadsheets, but more investigation by me will need to be performed to get the balance between speed and compression since I now have these options BOTH set:
options = ac.OoxmlSaveOptions()
options.compression_type = ac.OoxmlCompressionType.LEVEL9
and
wb.settings.memory_setting = MemorySetting.MEMORY_PREFERENCE
Just so you know the metrics I’m working with (200,000 rows by 282 cols wide):
Using the legacy system which focuses on shared strings and simply builds the spreadsheet file components individually and zips the contents back up (complex stuff we would rather not maintain) creates a 98 MB version of the workbook. With the memory and compression settings together, my Apsose spreadsheet (same data) is 175 MB. Interestingly when I open and then save the spreadsheet it got reduced to 116 MB (?).
The value proposition Apsose brings is very high for everything else and it wins all battles in 99% of our cases easily, so thanks again for that! I’ll continue to see what I can do the squeeze the size down in an automated fashion… (Last step: open with XLWings and close?)
Any ideas appreciated!
@SteveOlson
Could you unzip the two Excel files, then share which xml file size is reduced?
And I know it’s hard to upload so large file to dropbox or other clouds , but without files, we can not find difference.
I took the original and did an extract
image.png (50.8 KB)
Made a copy, opened and saved and opened another extract:
image.png (43.7 KB)
Most all files I see in these are very small
Looking through the files I see this:
Before save
sharedStrings.xml 33.671 kb
sheet4.xml 1,646,757 kb
after save
sharedStrings.xml 33.671 kb
sheet4.xml 834,382 kb
I’ve not worked directly with extract files in the past, so let me know if there is more info I could provide that would help.
Thank you for sharing the screenshots and details regarding the XML file size.
We would still request you to provide the extracted XML files as well. Anyways, we will review your case thoroughly and get back to you soon.
I could do that, would it be easier to send you the before and after spreadsheets (I made a copy so I could do that)? Not sure how to do that here.
@SteveOlson
Please upload files to dropbox or other clouds as the files are too large, then share a link here. We will check them soon.
Hi, sorry, have been stacked up. I will try to get you these file by early next week. I wanted to run through some more testing.
Please take your time to share the resource files with us. Once received, we will thoroughly evaluate your issue using the provided files.
I’ve created 4 files on dropbox. They show the files created by apsose at level 1 and level 9. The other number in the name is the seconds to create. The smaller ones are the size of the same file, if I open and make a save and close manually. They squeeze down quite a bit more after open and save.
Let me know if you cannot access the links:
https://www.dropbox.com/scl/fi/zic59hvxhjww94hgmw5nr/option_9_1900.xlsx?rlkey=3ob4tnzipt93cklettocmuizb&st=2na8fxs6&dl=0