Delete Blank Rows after index

But I would like to remove only the blank rows from a specific region, for example from row 50 to 100000.
One of the reasons we use Cells.DeleteBlankRows() is that we have spreadsheets with almost 1000000 rows, I believe it is not efficient to remove row by row.
It is possible to have something like:
public bool DeleteBlankRows(int rowIndex, int totalRows);

or set rowIndex and totalRows in DeleteOptions
public void DeleteBlankRows(DeleteOptions options);

What is the most efficient way to delete these rows?

@linearsm,

Aspose.Cells does not provide a method/overload like DeleteBlankRows(int rowIndex, int totalRows), but you can try to implement your own by leveraging Aspose.Cells’ functionality. Here is the workflow.

  1. Read the range and collect blank rows. Iterate through your specified range to identify blank rows and collect their indices.
  2. Delete rows in batches. Delete from the bottom up helps to maintain row indices correctly.

See the following sample code for your reference.
e.g.,
Sample code:

Workbook workbook = new Workbook("Book1.xlsx");
Worksheet worksheet = workbook.Worksheets[0];
int startRow = 50;
int endRow = 100000;

// List to store indices of blank rows
List<int> blankRows = new List<int>();
        
// Iterate through the specified range to find blank rows
for (int i = startRow; i <= endRow; i++)
{
    // Check if the row is blank
    bool isBlank = true;
    for (int j = worksheet.Cells.MaxColumn; j >= 0; j--)
    {
        Aspose.Cells.Cell cell = worksheet.Cells[i, j];
        if (cell != null && !string.IsNullOrEmpty(cell.StringValue.Trim()))
        {
            isBlank = false;
            break;
        }
    }

    // If row is blank, add to the list
    if (isBlank)
    {
        blankRows.Add(i);
    }
}
        
// Delete rows in reverse order to avoid re-indexing issues
blankRows.Reverse();
foreach (int rowIndex in blankRows)
{
    worksheet.Cells.DeleteRow(rowIndex);
}
....

Hope, this helps a bit.

Thanks for replying.

One problem with this method is that it doesn’t have the same performance as Cells.DeleteBlankRows().

To illustrate, I tested both cases and the performance is very different.

I created a file that contained a string in row 1 and 1000000.

In one case I used Cells.DeleteBlankRows() and in the other a loop deleting from row 3 to 999999.

TestAspose.TestDeleteBlank("D:\\testDelete.xlsx");
TestAspose.TestDeleteBlankLoop(3, 1000000, "D:\\testDeleteLoop.xlsx");

 public class TestAspose
 {
     public static void TestDeleteBlankLoop(int startRow, int endRow, string file)
     {
         Console.WriteLine("Start Delete Blank Loop");
         Stopwatch stopwatch = Stopwatch.StartNew();

         Workbook workbook = new Workbook(file);
         Worksheet worksheet = workbook.Worksheets[0];

         List<int> blankRows = new List<int>();

         for (int i = endRow; i >= startRow; i--)
         {
             bool isBlank = true;
             for (int j = worksheet.Cells.MaxColumn; j >= 0; j--)
             {
                 Aspose.Cells.Cell cell = worksheet.Cells[i, j];
                 if (cell != null && !string.IsNullOrEmpty(cell.StringValue.Trim()))
                 {
                     isBlank = false;
                     break;
                 }
             }

             if (isBlank)
             {
                 blankRows.Add(i);
             }
         }

         foreach (int rowIndex in blankRows)
         {
             worksheet.Cells.DeleteRow(rowIndex);
         }

         workbook.Save(file);

         stopwatch.Stop();
         Console.WriteLine("Finish Delete Blank Loop: " + TimeSpan.FromMilliseconds(stopwatch.ElapsedMilliseconds).ToString(@"mm\:ss\.FFF"));
     }

     public static void TestDeleteBlank(string file)
     {
         Console.WriteLine("Start Delete Blank");
         Stopwatch stopwatch = Stopwatch.StartNew();

         Workbook workbook = new Workbook(file);
         Worksheet worksheet = workbook.Worksheets[0];
         
         worksheet.Cells.DeleteBlankRows();

         workbook.Save(file);

         stopwatch.Stop();
         Console.WriteLine("Finish Delete Blank Loop: " + TimeSpan.FromMilliseconds(stopwatch.ElapsedMilliseconds).ToString(@"mm\:ss\.FFF"));
     }        
 }

Result:
Start Delete Blank
Finish Delete Blank Loop: 00:00.633
Start Delete Blank Loop
Finish Delete Blank Loop: 02:14.534

In this case for me it is a very big difference in performance.

@linearsm,

Surely, Cells.DeleteBlankRows() is more efficient and provides better performance, and you should use it if you want to remove all blank rows in the whole worksheet unconditionally. The workaround provided would work conditionally when you need to remove some blank rows in a specific range of cells only. Please note and as I told you, Cells.DeleteBlankRows does not provide any overload that could remove blank rows for a certain range of cells.

@linearsm
According to your requirement of deleting blank rows/columns between the specified range, we have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-56727

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@linearsm

For your requirement, we provide new options to specify the range to be checked and deleted:

DeleteBlankOptions.StartIndex/EndIndex

For deleting rows, those properites represent the range of rows(start row and end row, inclusive); For deleting columns, those properties represent the range of columns(start column and end column, inclusive).

The new apis will be included into our next official version 24.10.

The issues you have found earlier (filed as CELLSNET-56727) have been fixed in this update. This message was posted using Bugs notification tool by johnson.shi