Iterating cells in a range

When iterating cells in a range, it works fine if all cells in the range have values.
If the range refers to empty cells it doesn’t iterate. I think this used to work.

        var range = worksheet.Cells.CreateRange("$A$3:$A$19");
        foreach (Cell cell in range) //Iteration is skipped if any cell in range is blank
        {
            System.Diagnostics.Debug.Print($"{cell.Name}: {cell.Value}");   
        }

@Moonglum,

It won’t include empty or un-initialized cells in the range. You may cope with it using the following
sample code:
e.g.
Sample code:

.......
//this will include all cells (empty and non empty, etc.)
for (int i = range.FirstRow; i < range.RowCount + range.FirstRow; i++)
            {
               for (int j = range.FirstColumn; j < range.ColumnCount + range.FirstColumn; j++)
               {
                Cell cell = worksheet.Cells[i, j];
                //your code goes here.
                //........
               }
             } 
......

Hope, this helps a bit.

Thanks, it does help. I was just surprised because this behavior is not like Excel.

@Moonglum,

Good to know that the suggested code segment helps you for your task. In the event of further queries or issue, feel free to contact us back.

This doesn’t make sense.
In excel, though a cell isn’t initialized, it still appears in my range.
The sample provided, shows they have chosen a range, A1:A19, in excel should i loop through the cells in this range, it will initialize or at the very least provide a result for a blank cell, or if writing create the cell to write to.

This would have much higher ramifications on larger datasets where a developer is expecting to go through a range, lets say “A1:B5” This range contains 10 cells. The developer expects to go through all 10 cells.

Based on your response:
Range r = worksheet.Cells.CreateRange(“A1:A19”);
If foreach, is only responding for cells that are initialized.
then r.CellCount should respond with the cells that exist.
But, r.CellCount responds with the value of 19. The correct cell count for the range A1:A19, but the iterator is not iterating through 19 cells.

I think this is fundamentally wrong.
r.CellCount should respond with the actual number of cells
or (preferably) ForEach should iterate through the number of cells in CellCount.

Thoughts?

@PJJTLC,

Please zip your Excel file and attach here, we will check it soon. If it is an issue with APIs, we will surely fix it.

There is no physical workbook.
[Also your provided workaround, is a performance killer, a loop within a loop is just bad]

See code snippet below. the variable cellCount will return 19, where the ForEach will not print anything at all, where it should return 19 entries with the cell addresses “A:1” through “A:19”

var workbook = new Workbook();
var worksheet = workbook.Worksheets[0];
var range = worksheet.Cells.CreateRange(“A1:A19”);
var cellCount = range.CellCount; //cellCount = 19

       foreach (Cell cell in range) //Iteration is skipped if any cell in range is blank
        {
            System.Diagnostics.Debug.Print($"{cell.Name}: {cell.Value}");
        }

SAMPLE VBA, performs as expected:
Dim r As Range
Dim c As Range
Dim count As Integer

Set r = Application.Range("A1:A19")
count = r.Cells.count

For Each c In r
    Debug.Print c.Address + " " + c.Value
Next c

THanks
pete

@PJJTLC,

Thanks for providing more details.

To get details about all the un-initialized or empty cells is not optimized way either. Please note, when using the way (mentioned in the first post) or getting enumerator of cells range, Aspose.Cells would only record/consider those cells which are initialized or not empty, it won’t create Cell objects for un-initialized cells. This functionality is included in the APIs for performance grounds. That’s why using your following lines of code, although range’s cell count is 19 but all those cells are not initialized as you got to create those cells first if you want to get these included into the range. Therefore it will skip iteration.

foreach (Cell cell in range) 
{ 
}

But your comment is not right, i.e., “//Iteration is skipped if any cell in range is blank”. Iteration would occur to loop through those cells which are initialized or not empty. For example, see the following sample code:
e.g.
Sample code:

var workbook = new Workbook();
            var worksheet = workbook.Worksheets[0];
            var range = worksheet.Cells.CreateRange("A1:A19");
            range[3, 0].PutValue("1");//enter value into A4 cell.
            
           //Good way (performance oriented) but it will extract only the initialized cells
            IEnumerator cells = range.GetEnumerator();
            while (cells.MoveNext())
            {
                Cell cell = cells.Current as Cell;
                Console.WriteLine(cell.Name + ":" + cell.Value);
                
            }
            
           //this is other way round.
            foreach (Cell cell in range) 
             {
                Console.WriteLine(cell.Name + ":" + cell.Value);
             }

In above example, we add a value to A4 cell, so, this cell will be initialized and print the value of that cell via both ways.

In short, if you need to print all cells, you got to use my suggested mentioned approach in my first reply in the thread.

Understood.

But. what is the purpose of creating a range A1:A19, if i were not going to interact with it?

But, if i create a range, i would expect there to be a cell for that entire range. Enumerations are far faster than object based loops with conditional object identification. This is why they exist.
Aspose prides itself on performance, and a loop within a loop just isn’t performant at this level of programming, if we were writing in assembler, the nested loop is less of a concern, but abstracted as far out as we are, component developers need to provide interfaces that best meet the demand and purpose of their product. This is the second most important aspect of a component under our consideration, immediately after functionality, and just before pricing.

As moonglum has pointed out, he believes this used to work. I tend to believe him as he has been the primary developer for our products using aspose since 2015.

We also have licenses for Total and Java.

Please express our desire that this will be looked at. We frequently need to work with hundreds of thousands of cells at a time, and nesting loops is a performance hit.

Thanks,
Pete

@PJJTLC,

I guess in most cases users only care about initialized cells or cells with data in the worksheet. You can interact the whole range. For example, you may format all the cells in the whole range in one go or perform other tasks to the whole range.

As I told you, due to performance, by default, Aspose.Cells would record initialized cells only.

If you are reading/writing huge datasets, we recommend you to use our LightCells APIs. Aspose.Cells provides an interface, LightCellsDataHandler that needs to be implemented in your program. The interface represents Data provider for reading large spreadsheet files in light-weight mode. The architecture needs far less memory and will give you better performance and efficiency.

We do care about your concerns so we will further evaluate it and get back to you soon.

Thanks.

Previous cursory looks at LiteCells API showed it did not quite meet our needs, but we will look again. (Thanks for the reminder)

I still stand by my thoughts that the enumeration should loop through even “uninitialized” cells in a range, and allow initialization within the enumeration. [Excel doesn’t store uninitialized cells, but allows iteration with those cells through all interfaces. They exist in an abstract manner so you can find them through any method.]

Please note that the other driving force behind this request is for code commonality, and sharing. We work with workbooks using various components, in multitudes of technologies. Having functionality that closely matches Excel provides us a means to share functional code structure across environments, languages and regions. This also decreases our time to bring new developers up to speed. Excel is our base line, but carries the excessive weight of a UI, and unnecessary functionality/cost, hence the market for Aspose.Cells.

Thanks again, We look forward to hearing back on this.

pete

@PJJTLC,
We have noted your feedback and will share our thoughts after detailed analysis.

@PJJTLC,

We have evaluated it in details.
We are sorry but you have to iterate all the cells by using for loop (as we provided the suggested code). Also, even if we implement such kind of enumerator, it would work in the same way like we have to check whether the cell exists or not and then build Cell object, but (above all) it will cause worse performance than even using for loops for sure. Anyways, we cannot change the enumerator to such scenario that contains all cells including empty (uninitialized) ones, that will impact our other users’ applications and their logic.

Moreover, Range.CellCount does give ambiguous meaning for users. So, we might consider to remove it later or change the returned value as the existing cell instances only which may be iterated by the enumerator

We are sorry but currently we cannot support to provide you a Cell object for empty or uninitialized cells and remove them from the model automatically later on after the enumerator moves to next. Please note, Aspose.Cells model is not completely the same with the model of MS Excel or VBA. It is also possible that MS Excel may create temporary objects for one cell and destroy it later automatically, but for Aspose.Cells model, currently we cannot support such kind of operations as we need to consider the performance (on top), efficiency and convenience of all APIs together for our data model.

Thanks for your understanding!

Thanks for looking into this.

You do not need to respond to the following, but that raises the question;
How does Aspose handle a reference to an uninitialized cell during calculation? For instance, cell A1 contains the formula “=B1”, where B1 was never initialized. Does Aspose initialize cells for references?

Thanks for your reply.

pete

@PJJTLC,

No, it won’t include B1 cell until you initialize (create object of the cell) it manually or input some value to it. We may provide more details on it. We will get back to you.

@PJJTLC,

Furthermore, there are many functions which use range of cells, such as, SUM(A:C), MATCH(…,D:E,…). For such kind of formulas, if we instantiate all empty cells, it is very possible to cause OOM issue for the process. We have special logic to handle empty cells while calculating formulas. You should check empty cells in your own code if the memory cost may be important for you. For example, you should use Cells.CheckCell(row, col) to get the required cell object, if this method returns null, it means the required cell is empty and you can apply your special logic for it.

If you are iterating cells by IEnumerator, they can record the last non-empty cell and fetch next, if those two cells are not adjacent, then you can calculate the empty cells count between those two and apply your own special logic for empty cells accordingly.

Hope, this helps a bit.