Extract all formulas from a spreadsheet

Hi:

I created this procedure to extract all the formulas from a spreadsheet. It basically looks at all the cell formulas as well as at the formulas used in conditions.

Is there any other place I should look at? Where else can I have formulas in a spreadsheet?

I am using this function to scan for certain functions usages.


public static IList GetAllFormulas(Workbook workbook, ISet excludedWorksheets )
{
IList formulas = new List();
foreach (Worksheet worksheet in workbook.Worksheets)
{
if (excludedWorksheets != null && excludedWorksheets.Contains(worksheet.Name))
continue;

for (int row = 0; row <= worksheet.Cells.MaxDataRow; row++)
{
for (int col = 0; col <= worksheet.Cells.MaxDataColumn; col++)
{
if (worksheet.Cells[row, col].IsFormula)
formulas.Add(worksheet.Cells[row, col].Formula);

}
}


for(int i = 0; i < worksheet.ConditionalFormattings.Count; i++)
{
FormatConditionCollection formatConditionCollection = worksheet.ConditionalFormattings[i];
for (int j = 0; j < formatConditionCollection.Count; j++)
{
FormatCondition condition = formatConditionCollection[j];
if (condition.Formula1 != null)
formulas.Add(condition.Formula1);

if (condition.Formula2 != null)
formulas.Add(condition.Formula2);
}
}

}
return formulas;
}

Thanks

Hi,


I think you are doing fine to browse the possible areas to add to the formulas list. Moreover you may also include data Validations’ formulas (e.g Formula1 and Formula2 etc.) to it.

Thank you.

Thank you. If there is anything else please let me know. This is the new function in case someone else needs it.

public static IList GetAllFormulas(Workbook workbook, ISet excludedWorksheets )
{
IList formulas = new List();
foreach (Worksheet worksheet in workbook.Worksheets)
{
if (excludedWorksheets != null && excludedWorksheets.Contains(worksheet.Name))
continue;

for (int row = 0; row <= worksheet.Cells.MaxDataRow; row++)
{
for (int col = 0; col <= worksheet.Cells.MaxDataColumn; col++)
{
if (worksheet.Cells[row, col].IsFormula)
formulas.Add(worksheet.Cells[row, col].Formula);

}
}


for(int i = 0; i < worksheet.ConditionalFormattings.Count; i++)
{
FormatConditionCollection formatConditionCollection = worksheet.ConditionalFormattings[i];
for (int j = 0; j < formatConditionCollection.Count; j++)
{
FormatCondition condition = formatConditionCollection[j];
if (condition.Formula1 != null)
formulas.Add(condition.Formula1);

if (condition.Formula2 != null)
formulas.Add(condition.Formula2);
}
}

for(int i = 0; i < worksheet.Validations.Count; i++)
{
Validation validation = worksheet.Validations[i];

if (validation.Formula1 != null)
formulas.Add(validation.Formula1);

if (validation.Formula2 != null)
formulas.Add(validation.Formula2);
}
}
return formulas;
}
}

Hi,

Thanks for using Aspose.Cells.

If you find any other issue, please feel free to post. We will be glad to help you asap.

Thank you for this code. I was using it and inevitably hit a document that someone put a single cell at the bottom of the sheet so it had to search all 16 million cells which took forever. To anyone using this code, I recommend you add an if statement before checking all the cells in a row using “row.isblank” to skip blank rows. Once I added that it scanned the file in seconds versus an hour.

@cnoonan

Please also try the Worksheet.Cells.GetEnumerator() method for your needs. Please see the following sample code, its sample Excel file and console output for your reference.

The benefit of enumerator is that it gives user only those cells which has been instantiated. This is the reason that it does its work instantly.

For checking properties of all cells, user should always use Enumerator instead of checking every cell by its index one by one. Checking cells by index in loop not only downgrades the performance critically, but also increases the possibility of OutOfMemoryException because lots of unnecessary cell objects may be instantiated.

Download Link:
Sample Excel File.zip (5.6 KB)

C#

Workbook wb = new Workbook("sample.xlsx");

Worksheet ws = wb.Worksheets[0];

IEnumerator ie = ws.Cells.GetEnumerator();

while (ie.MoveNext())
{
    Cell cell = ie.Current as Cell;

    Debug.WriteLine("Name: " + cell.Name);
    Debug.WriteLine("Formula: " + cell.Formula);

    Debug.WriteLine("");
}

Console Output

Name: B2
Formula: =A1

Name: XEV9
Formula: =A2

Name: XEV1048557
Formula: =A3

Name: XEN1048560
Formula: =A4
1 Like