We have an enhancement request that would help us more efficiently solve a problem that we’ve been working around for a long time. We use a Custom Calculation Engine to refresh workbooks containing proprietary formulas and we must recalculate the workbook several times so that we can handle batch processing of formula data retrieval. We need to cache formula results so that multiple calls to CalculateFormula are efficient and can be used to “collect” formulas and then “distribute” formula responses on the next recalc. Also due to the purpose of our application it is possible that we will manipulate the workbook in between recalcs (e.g. we may insert or delete rows/columns).
The challenge we have is building a cache key so that we can reliably identify that we’ve seen a specific formula on a previous recalc and have a result in our cache. A common solution is to build a cache key based on the formula’s function name and a serialized copy of the formula arguments. Due to the complexity of our application this only works if we can also tie that key to the cell location. However, the trouble is the cell can move due to workbook manipulation between recalcs and this will invalidate our cache key.
Our solution at the moment is to create a named range for every cell that contains a proprietary formula. Then when processing a recalc we can build a key using the cells namedrange, function name and arguments to lookup in the cache. Prior to saving the workbook we remove all of these temporary names ranges.
This solution works, but it’s a of a misuse of named ranges. During a recalc we can have over 500k cells containing formulas and it can take up to 2 minutes just to remove all of the named ranges. Certainly we would never save a file with these named ranges, they’re only to help the calc process.
Our enhancement request is to add a temporary “Tag” property or object type metadata property to the Cell object. This would allow developers to hide metadata in this property which will carry state with the Cell regardless of changes in the cells location. I recognize that there may not be an OpenXML spec to persist this metadata in the saved Excel file. We’re not asking for this to be persisted or added to the saved file, only in memory to assist with the manipulation and calculation of the workbook where state information needs to be managed and tracked. Ideally the metadata can be looked up directly on the cell (e.g. data.Cell.Metadata ) and also a mechanism to find all cells which contain metadata so that we don’t have to search cell by cell (e.g. workbook.Worksheets.GetCellsContainingMetadat() ).
There is no such property (Tag or metadata, etc.) available in Cell object of Aspose.Cells APIs. Also, there may not be any such attribute existed in open xml structures and specifications for Excel file formats. I guess there can be three available options that you may try if any of the options suits your needs.
Add comments to store your cache (formula) value/result and other details for your desired cell(s) in the worksheet. See the topic on Managing Comments (Notes) for your reference. You may hide the comments accordingly. You may manipulate comments (update, remove, etc.) at any stage. Also, at the end before saving the file, you may remove the comments in one go.
Add custom properties for the document for the task if it suits your requirements. See the document for your reference. Managing Document Properties
You should separate the module regarding storing cache results or adding other metadata, it should not be stored in the Excel workbook using Aspose.Cells APIs. For example, you may write to memory variables or arrays, some text or log files for the purpose.
@weissa,
Cell object is the most numerous one in the workbook. Any extra information will cause much more memory cost, which is unnecessary for most users. So, we are afraid we cannot add your expected data into the Cell data model.
There is another workaround for your requirement we think worth to try. You may defined your formula as:
=IF(FALSE,“A1”,YourActualFormulaHere)
=IF(FALSE,“A2”,YourActualFormulaHere)…
Thus you can use the formula itself as the cached key.
Further more, if you can replace the cell address to sequence number(int value):
=IF(FALSE,1,YourActualFormulaHere)
=IF(FALSE,2,YourActualFormulaHere)…
It should be able to give a bit more benefit of performance.
Thank you for the feedback. I will look into comments as an alternative to named ranges. Our goal is not to store cache information in a cell but rather to store an identifier so that when a formula calculates we can look up state information (from our own service/memory) from a previous calculation. Specific to our use case is that the formula is tied to the cell and the cell may move. As such the identifier must be able to handle this. Named ranges work for this and I suspect comments would too. My only concern is that this is not the intended use of these properties and if there is any performance or other concerns using then this way. We expect that we’ll be tracking 500k+ formulas for some workbooks.
Our named range solution is working and performance is acceptable. Given our situation if you see any other recommendations or future options we would appreciate any advice.
Agreed there is no open xml spec to support this. So any addition to support our use case would not be part of a saved XLSX file.
Thank you for taking a moment to respond. I’ve seen similar techniques with the CHOOSE formula which is also when I learned about certain optimizations with that formula to watch out for. Unfortunately in this case we don’t control the workbook creation so this isn’t a good fit for this use case, but still a good technique.
@weissa,
In your program, after loading the workbook from template file, you may update the formulas in the workbook as the required format(such as using IF or CHOOSE function) before your other processes such as calculation. At last when you need to save the workbook, you can just update them as the original formulas and then save the workbook to the resultant file.