Changes between 7.7.2 and 8.2.2.1 in the Cells collection

(Big sigh)

You guys have changed something in the way the Cells collection works (or its iterator) between version 7.7.2 and 8.2.2.1.

I have the following program. It is a contrived example but it simulates something that I do in my real app.

private static void TestBorders()
{
Workbook workbook = new Workbook(@“C:\temp\spreadsheets\Test_borders.xlsx”);
Worksheet worksheet = workbook.Worksheets[0];


foreach (Cell cell in worksheet.Cells)
{
Console.Write(“Cell {0}”, cell.Name);

Style cellStyle = cell.GetStyle(true);
BorderCollection borders = cellStyle.Borders;

if (borders[BorderType.TopBorder].LineStyle != CellBorderType.None)
{
if (cell.Row > 0)
{
Cell cell2 = worksheet.Cells[cell.Row - 1, cell.Column];
}
}
if (borders[BorderType.LeftBorder].LineStyle != CellBorderType.None)
{
if (cell.Column > 0)
{
Cell cell2 = worksheet.Cells[cell.Row, cell.Column - 1];
}
}
Console.WriteLine("");

}
}


In 7.7.2 it displays:

Cell A1
Cell B1
Cell C1
Cell B2
Cell B2
Cell C2
Cell B4
Cell B4
Cell C4
Cell A4
Cell B4
Cell C4
Cell B5
Cell B5
Cell C5

In 8.2.2.1 it displays:

Cell A1
Cell B1
Cell C1
Cell B2
Cell B2
Cell B4
Cell B4
Cell A4
Cell B4
Cell C4


Note that in 8.2.2.1 the cells on line 5 are never returned by the Cells collection.

So, if in the foreach loop cells are created, in 8.2.2.1 cells might be skipped.

Maybe what I am doing is not a good idea, that is to mutate the structure of the spreadsheet while scanning the Cells collection.

What do you recommend in case I need to get a static list of the cells in the worksheet?

Thanks

Hi Costa,

Thanks for your posting and using Aspose.Cells.

There is a breaking change in v8.0.0. Please read it from the following link. I have also pasted the contents here.

( http://www.aspose.com/community/files/51/.net-components/aspose.cells-for-.net/entry534643.aspx )

Implementations of Row/Cell/RowCollection is changed

In old versions, Row and Cell objects are kept in memory for representing corresponding row and cell in a worksheet.
The same instance will be returned whenever user call methods such as RowCollection[int index], Cells[int, int] and so on.
For memory performance consideration, from this version on only properties and data of Row and Cell will be kept in memory.
Row/Cell object become the wrapper of those properties and data for user's convenience to manipulate cells model
and will be newly instantiated when user call those methods.
So, now user will get different objects when calling the same method to get Row/Cell many times
even though those different objects all refer to the same row/cell in the worksheet.
This change may affect user's application for following situations:

1. If user using code like
if(row1==row2)...
if(cell1==cell2)...
to check the same Row/Cell, with new versions those checks may fail. Please use row1.Equals(row2) and cell1.Equals(cell2) instead.

2. Because Row/Cell objects are newly instantiated according to user's invocation, they won't be kept and managed in memory by cells component.
After some inserting/deleting operations, their position(row/column index) may not be updated or even worse,
those objects become invalid. For example, for following code:

Cell cell = cells["A2"];
Console.WriteLine(cell.Name + ":" + cell.Value);
cells.InsertRange(CellArea.CreateCellArea("A1", "A1"), ShiftType.Down);
Console.WriteLine(cell.Name + ":" + cell.Value);

with old versions, the cell will refer to A3 after the insert operation and its value is same with the one before insert.
However, with new version the cell object will become invalid or still refer to A2 with other value.
For such kind of situation, user need to get Row/Cell object again from cells collection to get the correct result:

Cell cell = cells["A2"];
Console.WriteLine(cell.Name + ":" + cell.Value);
cells.InsertRange(CellArea.CreateCellArea("A1", "A1"), ShiftType.Down);
cell = cells["A3"];
Console.WriteLine(cell.Name + ":" + cell.Value);

3. RowCollection now does not inherit CollectionBase because there is no Row object in its inner list anymore.

You can also read what has changed in earlier versions from the following links

( http://www.aspose.com/docs/display/cellsnet/Migrating+from+Earlier+Versions+of+Aspose.Cells )
( http://www.aspose.com/docs/display/cellsnet/Public+API+Changes+in+Aspose.Cells+8.0.0 )

Actually, there are initially 9 cells in your worksheet, which are these

Cell A1
Cell B1
Cell B2
Cell B4
Cell B5
Cell C1
Cell C2
Cell C4
Cell C5

If you will read the Worksheet.Cells.Count property, you will get 9.

As you can see, Cell A2 is not yet existed inside the collection. Now if you will access cell A2, then number of cells will become 10.

Whenever you will try to access cell which does not exist inside the collection, it will be added inside the collection and count will increase.

Please see the following code and its console output and comments for your reference.

C#
string filePath = @"F:\Shak-Data-RW\Downloads\Dates.xlsx";

Workbook workbook = new Workbook(@"Test_borders.xlsx");
Worksheet worksheet = workbook.Worksheets[0];

//There are actually 9 cells initially
Console.WriteLine("Number of Cells: " + worksheet.Cells.Count);

//Access cell which does not exist now i.e A2
Cell cell1 = worksheet.Cells["A2"];

//Now, number of cells will be 10
Console.WriteLine("Number of Cells: " + worksheet.Cells.Count);

Console Output:
Number of Cells: 9
Number of Cells: 10


Hi Costa,

Thanks for using Aspose.Cells.

To update you further, commonly it is not recommended for user to modify the cells collection(such as instantiate new Cell/Row or delete existing Cell/Row) while traversing cells from the enumerator. Even for old versions, the enumerator cannot make the traversal work properly always, some elements may be traversed repeatedly or skipped too for some special situations.

For your situation, we think you can traverse the cells collection and cache the position(row&column index) of those cells that need to be changed firstly. After finish the traversal, you can then modify those cells according to the gathered information.

Another solution: if you required, we may consider to provide some new API to provide special enumeration which can check the modification of the cells collection automatically and synchronize with it, but the performance of traversal may be degraded a bit when comparing with the normal enumerator.

Hi team,


I am having a hard time understanding what to expect when iterating over the Cells collection.
I see in this discussion that sometimes the same cell is repeated during iteration, and for some files of my own I’ve noted that I actually get the same cell up to hundreds of times. For other strange instances the iteration gives me no cells at all although the sheet is not empty.

What am I to expect from the iterator provided in Cells? Is the iteration impacted by what I have done in terms of accessing/changing data in the cells before the iteration? If so, what is the best way for me to quickly iterate over ALL cells with data on a sheet?

My use case is the following: I receive files in which I need to look for cells marked with special commands. To do so I need to iterate over every single cell in the sheet in the quickest possible way and without allocating too much redundant memory. With the latest releases (8.3.0 in particular) I am getting many strange results with repeated cells, or no cells found at all and on some occasions the iteration stops after I have processed only about half of the sheet… What should I do to get consistent behaviour?

Best regards
Claes

Hello:

Your claims are pretty scary, if they are true. I did some pretty extensive testing in my case comparing version 7.7.2 with version 8.2.2.1 and I haven’t seen anything strange (related to this particular operation), other than what I reported on this thread. Since then, I changed my code such that, inside the loop where I go through each cell, I don’t change the worksheet anymore. I also upgraded to version 8.3.0 (from 8.2.2.1) but this time I haven’t done extensive testing assuming that there are no changes in this area. I did the upgrade because they fixed another issue I reported. We can all agree that this is basic functionality and that should not change in minor versions unless there are bugs.

I haven’t looked at your threads, but I would appreciate it greatly if you can post test cases where we can reproduce this issue or issues you came across. It will help me and others to understand the issues. Perhaps I need to revert to version 8.2.2.1.

Thanks for your post,
Costa


Hi Claes,

Thanks for your posting and using Aspose.Cells.

Please provide us your sample file and code to reproduce this issue. We will look into it and help you asap.

Hi,


Ok, so I should then expect that iterating over the Cells collection will give me every cell once and only once?

I would be happy to provide sample files if it was easily reproduced, the problem is all our code wrapping the usage of Aspose making it hard to pinpoint under which circumstances things happen. For one of my situations I tried to get the same problem by simply opening the file and starting to iterate, but then everything turned out fine. That is why I was asking if the result of the iteration is affected by other operations I may have performed on the file prior to starting the iteration.

I know now what to expect and will try to reproduce issues with Aspose code only.
Please note that I will have to make private posts as the files may contain information that is sensitive for our customers.

Best Regards
Claes

Hi Costa,


I managed to reproduce this issue now (plus some exceptions about removed cells) and find the little tweak that is needed trigger these problems and that is using the below loadOptions:
LoadOptions lo = new LoadOptions();
lo.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);

We often handle large files so I recently turned this on in our system and apparently this is when the cell iterators started acting crazy. I will have to make a private note to the team with my file, but I am sure they will inform you as well when the issue is fixed. Either way it seems that you are safe if you are not using this memory setting.

Best regards
Claes

Hi Claes,

Thanks for providing your sample code in your other thread and using Aspose.Cells.

We have logged your issue in our database for investigation. We will look into it and advise you asap.

Please refer to your other thread for detailed reply.

( MemorySetting.MEMORY_PREFERENCE gives buggy cell iterators )

Claes:

Thanks for sharing this with us. Let’s see what they will find…