I was wondering if there was a method to return a rows collection in a worksheet following a certain criteria?
For example, if i only wanted rows that had data/no data or is hidden/not hidden. At the moment I would have to go through each row and check the property of the row. If a worksheet has data entry on say rows 500,000 and 510,000 and nothing from row 100, there’s half a million iterations that will have no data to be checked.
Is it posible to return back the rows that match either HasData or Hidden as a collection? Or an array of indicies?
Yes, that solution is inefficient for large worksheets.
Alternatively a method to move to next row with data would be beneficial in this instance. There is already a “FirstDataCell” and “LastDataCell”, it would be good to have a “PreviousDataCell” and “NextDataCell”
Properties such as FirstDataCell and LastDataCell are abolute in the collection. However, “Previous” and “Next” are relative, they are based on one “CurrentCell” which is uncertain and ambiguouse.
For iterating rows to search subset of the collection that matches special condition, we have to do most of those operations like client applications do even if we implement the logic in our component itself. For memory performance consideration, we cannot cache too much information about cells in every row. So we have to check every cell/row for user specified criteria.
So we recommend using enumerators of row and cell from corresponding collection to get the best performance of traverse and check every object accordin to the expected condition.
The issue is for “unusual” worksheets that have been created with data in the first few rows, and then nothing until the very last row supported (1048576). Even-though the worksheet is small, to check every row for example to see if it’s hidden takes a lot of time as 99.9% of the checks are on empty rows. If there was a way to skip to the next row with data, that would make it orders of magnitude faster.
We’ve had the issue pop up and while the file it self is small, because of the millions of iterations per sheet, it’s taking minutes to process the empty cells.
To give you an idea what we’re dealing with, the xlsx XML has this:
As you can see no data from row 112 until 1048576 yet I don’t think there’s a way for me to understand this in Cells and skip any checks from 112 and move to row 1048576
Any other suggestions? Using the enumerator will still need to check each row wouldn’t it?
If your template file is small, the empty rows should not exist in the template file(so should not exist in the row collection of our cells model) except the first few and last few rows. So when you using Enumerator got from Cells.Rows, GetEnumerator(), those non existent rows will not be traversed and processed.
If you still have performance issue with it, please provide us the template file and runnable code, we will investigate it to check whether we can help to improve the performance for it.
You are welcome. Generally the sync flag can help to avoid synchronization issues for changing data in the collection while traversing it. However, like most implementations of IEnumerator for common Collections, it is not recommended to update(especially insert or delete items) the collection while traversing it by IEnumerator. Even though we have tried the best to make synchronization for the method, it is hard to make it work without any exception for all situations. Anyways, feel free to contact us anytime if you need further info.