Many blank columns in spreadsheet

We have a xlsm file that may contains many blank columns (we dont know how it happens). We want to write a program that uses Aspose.Cells to detect and delete these columns. But it seems that Aspose.Cells cannot detect correct column count.
using Aspose.Cells;
using Excel = Microsoft.Office.Interop.Excel;

namespace TestManyColumns
{
internal class Program
{
static void Main(string[] args)
{
{
Workbook wb = new Workbook(“1.xlsm”);
Console.WriteLine($“Column count in Aspose.Cells {wb.Worksheets[0].Cells.Columns.Count}”);
}

        {
            Excel.Application excelApp = new Excel.Application();
            Excel.Workbook workbook = excelApp.Workbooks.Open(Path.GetFullPath("1.xlsm"));
            Excel.Worksheet worksheet = (Excel.Worksheet)workbook.Sheets[1];
            Console.WriteLine($"Column in Microsoft.Office.Interop.Excel {worksheet.UsedRange.Columns.Count}");
            workbook.Close();
            excelApp.Quit();
        }
    }
}

}

output:

Column count in Aspose.Cells 145
Column in Microsoft.Office.Interop.Excel 16384

1 Like

@Zhigang_Wang

To detect and delete blank columns in an XLSM file using Aspose.Cells, you can utilize the DeleteBlankColumns method provided by the library. This method will automatically identify and remove any columns that do not contain any data.

Here’s a sample code snippet that demonstrates how to achieve this:

using System;
using Aspose.Cells;

namespace TestDeleteBlankColumns
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Load the workbook
            Workbook wb = new Workbook("1.xlsm");

            // Access the first worksheet
            Worksheet sheet = wb.Worksheets[0];

            // Delete the blank columns from the worksheet
            sheet.Cells.DeleteBlankColumns();

            // Save the modified workbook
            wb.Save("output.xlsm");

            Console.WriteLine("Blank columns deleted successfully.");
        }
    }
}

Explanation:

  1. Load the Workbook: The Workbook class is used to load your existing XLSM file.
  2. Access the Worksheet: You can access the desired worksheet using the Worksheets collection.
  3. Delete Blank Columns: The DeleteBlankColumns method is called on the Cells collection of the worksheet, which removes all columns that are completely empty.
  4. Save the Workbook: Finally, save the modified workbook to a new file.

This approach should help you manage the blank columns effectively. If you encounter any discrepancies in column counts between Aspose.Cells and other libraries, it may be due to how each library interprets the data in the spreadsheet. However, using the DeleteBlankColumns method should resolve the issue of removing unwanted blank columns.

For further reference, you can check out the example provided in the Aspose.Cells GitHub repository here.

Sources:
[1]: DeletingBlankColumns.cs

1 Like

It works. Thank you, AI.

@Zhigang_Wang
You are welcome. If you have any questions, please feel free to contact us at any time.

I found DeleteBlankColumns will not only delete blank columns in the end, but also delete blank columns in the middle. How to delete blank columns only in the end?

@Zhigang_Wang
Thank you for your feedback. Do you want to delete all empty columns after the last non empty column?

Yes, it is exactly what we need.
Deleting rows/cols in the middle may change formula references in this sheet or in other sheet(s). It is a side effect. So we hope you can add a bool flag in the DeleteOptions class (maybe), to avoid delete rows/cols in the middle in DeleteBlankRows/DeleteBlankColumns method.

@Zhigang_Wang
If you are only concerned about the reference being changed when deleting empty rows and columns, you can use the overload methods. Please refer to the following example code.

// Load the workbook
Workbook wb = new Workbook("sample.xlsx");

// Access the first worksheet
Worksheet sheet = wb.Worksheets[0];
           
Aspose.Cells.DeleteOptions options = new Aspose.Cells.DeleteOptions();
options.UpdateReference = true;
// Delete the blank columns from the worksheet
sheet.Cells.DeleteBlankColumns(options);

// Delete the blank rows from the worksheet
sheet.Cells.DeleteBlankRows(options);

// Save the modified workbook
wb.Save("output.xslx");

Thank you, but formula reference is only a example showing that deleting from middle may have some side effects. We still want a function that not delete from middle.

@Zhigang_Wang
Thank you for your feedback. We have created a new ticket and will notify you promptly once there are any updates.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-56877

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@Zhigang_Wang,

Maybe you could try the following sample code for your task.
e.g.,
Sample code:

// Load the Excel file
string filePath = @"e:\\test2\\Book1.xlsx";
Workbook workbook = new Workbook(filePath);

// Access the first worksheet
Worksheet worksheet = workbook.Worksheets[0];

// Get the last non-empty row and column
int lastDataRow = worksheet.Cells.MaxDataRow;
int lastDataColumn = worksheet.Cells.MaxDataColumn;
// Get the last row which is having style/formatting(empty), etc.
int lastRow = worksheet.Cells.MaxRow;
// Get the last column which is having style/formatting (empty), etc.
int lastColumn = worksheet.Cells.MaxColumn;

// Remove rows after the last non-empty row
worksheet.Cells.DeleteRows(lastDataRow +1, lastRow - lastDataRow, true);

// Remove columns after the last non-empty column
worksheet.Cells.DeleteColumns(lastDataColumn +1, lastColumn - lastDataColumn, true);

....

Thank you very much for your response. Your prompt reply is one of the key reasons I chose your company’s product.

Unfortunately, this piece of code also fails to resolve the issue I encountered. The attached file is the one I used for testing. This file is very strange (I don’t know how my user operated it). The number of columns that can be accessed through Microsoft.Office.Interop.Excel is 16384, and these columns can also be deleted using the DeleteBlankColumns method in Aspose.Cells. However, neither Cells.MaxColumn nor Cells.Columns.Count can detect the existence of these columns.
1.zip (392.1 KB)

@Zhigang_Wang
We will further analyze your issue, and our internal system has logged the issue as CELLSNET-56877. We will notify you as soon as there are any updates. Thank you!

@Zhigang_Wang
We are using the latest version 24.9, and running
int lastColumn = worksheet.Cells.MaxColumn;
gives lastColumn as 16383 (starting from 0, so the total number of columns is 16384), which is the same result as Microsoft.Office.Interop.Excel. Does this meet your requirements?
Thank you.

@Zhigang_Wang

Some additional details to understand the function:

For Cells.MaxColumn, it represents the maximum column index of those existing cell objects in current sheet. ColumnCollection has nothing to do with cell objects, it contains those columns that have custom settings, such as style,width, …etc. So the count of this collection is not same with Cells.MaxColumn.

ColumnCollection contains only those custom columns, most of those default columns aren’t in this collection, so its count doesn’t represent the maximum column either. For example, user may set special width for the fifth column, then this collection only contains one element so its count is 1 but current the maximum non-blank column is 5.

While checking and deleting blank columns, all contents such as non-blank cells, existing column objects, comments… will be taken into consideration. So the maximum range that will be deleted cannot be determined by any one aspect only.

@Zhigang_Wang
We re-evaluated your requirement with your provided template file 1.xlsm. For “last non-empty column”, it is ambiguous so we think it is better for you to specify the range by yourself according to your requirement when deleting columns.

In this template file, there are some cells have been defined with style but without data for XFD column, so for both ms excel and Aspose.Cells the column XFD(16383) is taken as the maximum one.

If you want the last empty column to be the one which contains cell data, then you may use Cells.MaxDataColumn, and amjad’s post should work for you.

However, for the logic of deleting blank rows, in fact in this template file there is one comment at cell AS1, so this column should not be deleted by Cells.DeleteBlankColumns() and AS column should be taken as the last non-empty column.

So, it is hard for us to determine which kind of data should be taken into account or not for user’s special requirement. For such kind of file, please determine the range according to your special requirement and then specify it when deleting columns(DeleteColumns(int, int, bool)) or blank columns(DeleteBlankColumns(DeleteOptions) with specified DeleteBlankOptions.StartIndex/EndIndex).

Thank you very much for your reply. I have resolved our issue based on your guidance!

@Zhigang_Wang,

We’re glad to hear your issue has been resolved after following the suggestions. Please don’t hesitate to reach out to us again if you have any additional questions or feedback.