Hi,
I need to add many thousands of comments to a sheet. The issue is that the performance degrades the more comments there are, example code:
var workbook = new Workbook();
workbook.Worksheets.Add();
var sheet = workbook.Worksheets.Last();
for (int i = 0; i < 100_000; i++)
{
var cell = sheet.Cells[i, 0];
_ = sheet.Comments.Add(cell.Name);
}
The first thousand comments get added fairly quickly (approx 50ms), but after a while it takes more than 5 seconds to add the same number of comments.
I assume it’s because Aspose is checking if a comment exists before it adds a new comment, which I read that in an older post. If that is indeed the cause of the slowdown, I think there should be a way to disable checking if the comment already exists to improve performance. Even if that is not the cause, I don’t think the performance should degrade when adding more comments.
If I change the loop to always add a comment to the same cell, it finishes immediately.
When adding a large number of comments to a worksheet using Aspose.Cells, you may experience performance degradation as the number of comments increases. This is likely due to the library checking for existing comments before adding new ones, which can slow down the process significantly after a certain threshold.
Unfortunately, there is no direct way to disable the existence check for comments in Aspose.Cells. However, you can optimize your code to improve performance. Here are a few suggestions:
Batch Processing: Instead of adding comments one by one, consider batching your operations. This can reduce the overhead of multiple method calls.
Minimize Access to the Worksheet: Try to minimize the number of times you access the worksheet or its properties within the loop. For example, store references to frequently accessed objects outside the loop.
Use a Different Approach: If possible, consider whether you can structure your data differently to reduce the number of comments needed.
Here’s an example of how you might implement batch processing:
var workbook = new Workbook();
workbook.Worksheets.Add();
var sheet = workbook.Worksheets.Last();
// Create a list to hold the comments
var commentsToAdd = new List<string>();
for (int i = 0; i < 100_000; i++)
{
commentsToAdd.Add(sheet.Cells[i, 0].Name);
}
// Add comments in a batch
foreach (var cellName in commentsToAdd)
{
_ = sheet.Comments.Add(cellName);
}
This approach may help mitigate some of the performance issues you are experiencing. If the problem persists, consider reaching out to Aspose support for further assistance or potential optimizations in future releases.
For more detailed information on managing comments in a worksheet, you can refer to the official documentation here.
If you have any further questions or need additional assistance, feel free to ask!
I tested your scenario/case using the following sample code with Aspose.Cells for .NET APIs.
e.g., Sample code:
var workbook = new Workbook();
workbook.Worksheets.Add();
var sheet = workbook.Worksheets.Last();
System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();
stopWatch.Start();
for (int i = 0; i < 100000; i++)
{
var cell = sheet.Cells[i, 0];
var c = sheet.Comments.Add(cell.Name);
}
stopWatch.Stop();
// Get the elapsed time.
Console.WriteLine(stopWatch.Elapsed.TotalSeconds);
It takes approximately 200 seconds to add 100,000 comments, which is considered a reasonable/good performance given the huge volume of comments being added. Do you still want us to check it for performance?
Sorry, I have further findings which might give you more info, look at the following sample code:
var workbook = new Workbook();
workbook.Worksheets.Add();
var sheet = workbook.Worksheets.Last();
// Values by themselves -> 75ms
var valueTimer = Stopwatch.StartNew();
for (int i = 0; i < 50_000; i++)
{
var cell = sheet.Cells[i, 0];
cell.PutValue("test");
}
valueTimer.Stop();
// Comments by themselves -> 500ms
// Comments after values inserted -> 46s
var commentsTimer = Stopwatch.StartNew();
for (int i = 0; i < 50_000; i++)
{
_ = sheet.Comments.Add(i, 0);
}
commentsTimer.Stop();
If I first insert 50 thousand comments and then 50 thousand cell values, both loops finish in around 500ms (75ms for values, 500ms for comments).
However, if I first insert the cell values, after which I add the comments, the comment adding takes 45 seconds instead of just half of a second.
Saving the final Excel has the same speed regardless of the order.
So simply switching the order of these operations can worsen/improve the performance by several orders of magnitude, even though the final saved Excel is the same in both scenarios. Hopefully you should be able to replicate it by swapping the loops, which should bring the time down from around 50 seconds to under a second.
Saving the Excel is quite slow in this case (around 80 seconds for me), but that’s another issue.
@Yelov
When Aspose.Cells adds comments and cell values, it needs to manage the association between them. If comments are inserted first, the data structure can quickly allocate positions for these comments without needing to synchronize with the cell values. Conversely, if cell values are inserted first and then comments are added, each comment insertion may trigger a rescan and re-adjustment of cell content to ensure the correctness of the association, thus increasing processing time.
@Yelov
We have reproduced your mentioned issue.
When adding comments ,calculating the distance from the current cell to A1 is required.
If value of cells are inserted, we have to iterate all rows to get row heights.
We have logged an issue :CELLSNET-57168, we will try to improve performance.
To improve the performance of adding comments, we think you may use the cache mechanism( Workbook.StartAccessCache) before adding comments(because there is no other operations that may change the row height and column width). Please change your code to:
...
workbook.StartAccessCache(AccessCacheOptions.PositionAndSize);
var commentsTimer = Stopwatch.StartNew();
for (int i = 0; i < 50_000; i++)
{
_ = sheet.Comments.Add(i, 0);
}
commentsTimer.Stop();
workbook.CloseAccessCache(AccessCacheOptions.PositionAndSize);