Ability to modify a file without loading the entire document into memory?

Hello,

One of the goals of our application is to be able to open existing Excel files, modify some small amount of content and then resave the file. Currently in order to do this, it requires us to load the entire document into memory which can be a big cost.

I’ve seen the light cells implementation but that doesn’t seem to fully cover what we need unless I’m misunderstanding the use case there. I was hoping to be able to use load filters to say load the document with only chart information, remove one chart + resave the file. The problem is this causes the other data of the excel that was not loaded to be lost on save. Is there anyway to to accomplish something like this?

Some examples of use cases where we would want to limit the amount of data we load:

  • Open the Excel and remove one chart of the document
  • Open the Excel and remove the content from a single cell
  • Open the Excel and remove a comment from a cell

We would ideally like to do the above operations without needing to load the full document into memory while opening / saving. I didn’t seem like this was a feature Aspose supported but wanted to check and see if I missed anything.

@jacobmalliet,

In above cases, I guess you would retain everything except one chart, single cell and one comment. So, it is obvious that you will load the entire document into object model of Aspose.Cells. We also recommend you to kindly use MemoryPreference option while loading the file or manipulating the workbook. This will surely reduce the memory (used) cost. See the document for your reference.

In case, you will find any memory or CPU cost, kindly do share a test case (a standalone sample code) and sample file (you may zip it prior attaching) to reproduce the issue, we will check it soon.

So, it is obvious that you will load the entire document into object model of Aspose.Cells.

That’s what I was wondering. I wasn’t sure if there was a way to work with Aspose that wouldn’t load the object model into memory and would instead lazy load data from the stream passed to it (and also saving only the modifications made instead of regenerating all of the content based on what was loaded).

We are using MemoryPreference already, I don’t think the memory usage is a problem given the entire object model is loaded, that part makes sense. Was mostly looking for a way to avoid loading the entire model while still being able to make those modifications but that doesn’t seem feasible.

Let me know if I misunderstood at all but sounds like this can’t be accomplished!

@jacobmalliet,

Aspose.Cells provides the LightCells API, mainly designed to manipulate cell data one by one without building a complete data model block (using the Cell collection etc.) into memory. It works in an event-driven mode. You may try LightCells APIs that may fit your needs to certain extent (if not fully).

Moreover, Aspose.Cells also provides feasibility (a kind of custom loading) to skip certain worksheets while loading and load your desired worksheets only (in the workbook) into its object model. But once you save the file, it will save only those worksheets (with your modifications) which you were loading. It will not retain those worksheets you have skipped while loading.

@jacobmalliet,
We are afraid this cannot be accomplished. As you may know, there are many global cache and data in ms excel’s files. Such as the cell string values, image data, …etc. Take the modification of one cell’s content as example, if you change the string value of it, then another part of the file, the global string cache, also needs to be updated accordingly. If you change the cell’s formula, another part in the file, the calculation chain data, will need to be updated too. Otherwise you may got one corrupted file or unexpected data from the re-saved file.

So, currently we cannot support suck kind of requirement. You have to load the complete file into our model to modify any part of it.