Hi, this questions is rather generic (how to get performance),
but I’ll be rather specific with what we’ve already tried/applied.
We’ve been using several different Aspose components (Cells, Words, Email) for quite some time now to our satisfaction. It has enabled us to grow our company to the point where performance starts to matter because of the sheer amount calculations we need to perform.
It is in this capacity that I am asking for some guidance as to assure we are getting the best out of your tools and we do not have any huge flaws or misunderstanding in our design. I am not reporting a problem, I do not have any reproductions.
In our typical design we are using Cells to perform calculations on a Workbook and then read out the results. The workbook is data-driven, which means we use a designer to link it to our datasource (Dataset). Some of our formulas are dynamic, meaning that they change with the number of rows in the datasource.
In code, we perform the following steps:
- Fetch the binary data(contents) for the Workbook.
This step is performed only once.
- Create an instance of the workbook and put this in the local cache.
- Each time we need a workbook of the same type, fetch this workbook from cache.
And use the ‘Copy’ method to initialize a new instance.
- Link a designer to the workbook.
- Use ‘SetDataSource’ to link the WorkBook to our DataSet.
- Call the process-method on the designer, I believe this is what makes our formulas grow and adds records to the worksheet.
- Use CalculateFormula on the document.
- Read the results of our formulas/worksheet and put them in some result object.
- Save the parsed workbook to a stream for logging.
Now we know these things:
- CalculateFormulas takes up about 80+ percent of all processing time. (About 5 seconds out of 6)
- The 'CalcStackSize’ property can have a huge impact on performance.
A value of 250 turns out to be ~ 2 times as fast as a value of 9.
- We are already using: MemorySetting = MemorySetting.MemoryPreference
- We use ParsingFormulaOnOpen = false, since at the time we open the workbook, the designer is not even linked yet. And I doubt we can process the designer on a workbook before opening (copy) it); Perhaps the ‘RecalculateBeforeSave’ method could suit our needs.
- CalculateFormula can create a formula-tree to speed up the next calculations based on the 'CreateCalcChain’ property. Much like regular expressions. But we are unsure about dependencies between Designer.Process /*Perform */ and WorkBook.CalculateFormula.
What we do not know is:
- Is the calculation-chain copied when you make a copy?
- Is the chain reset when you apply the datasource?
- Can we take one Workbook, Calculate the formulas and then recycle it with a different datasource over and over, or is applying the datasource a destructive process?
- Does Designer.Process /*Perform */ destroy the calculation-chain?
- Is it absolutely necessary to use CalculateFormulas, or can the values be determined otherwise, through some clever interactions?
- What we think we want to achieve is to have an initialized WorkBook, where the contents have been read and the formulas are ready to go. Then stick that version in cache and create copies when needed. Then only add the rows using the designer. And then read the results.
Is this even possible?
To summarize, since ‘CalculateFormulas’ is so expensive, we seek to reduce time spend in that method or reduce the number of times we need to call it.
Any feedback and or suggestions would be quite helpful.
Kind regards, Bas.