Large XLSX file to DataTable using the LightCell

dshetty123 · February 28, 2023, 10:29pm

We are facing a memory issue with the Aspose while reading a large XLSX file to DataTable.
So, we want to use the LightCell option here but facing an issue with the rowIndex -1 value.

LightCellFilereadIssue.jpg (137.4 KB)

johnson.shi · March 1, 2023, 1:27am

@dshetty123

LightCells provides an event-driven mode for loading large template files. With this mode, when one cell is parsed from the template file, it will be sent to your implementation of ProcessCell so you can process the cell according to your business requirement(for your current case, we think you need to put this cell’s data into your data table with the expected format/type here). When this method finishes, the cell object will be removed, we do not keep it in our cells model in memory and it is just the way this mode uses to save memory.

When the workbook is loaded completely, your process of all cells finishes too. All those cells will not be kept in the workbook, so you cannot access cells data later from the workbook object and some related apis such as Cells.ExportDataTable() are not available.

dshetty123 · March 1, 2023, 1:57am

We have a file that contains 75 columns and more than a million rows in it. (FileSize is more than 300mb).
I’m able to run the below code but taking more than 4 minutes to process. Please let me know if any other option to read in less than 2 minutes.

Code Snippet==>
var filePath = “Sample.xlsx”;

var table = new DataTable("myTable");
for (var col = 0; col < 75; col++)
{
    table.Columns.Add(col.ToString());
}


AsposeCell.LoadOptions loadOptions = new AsposeCell.LoadOptions(LoadFormat.Xlsx);

loadOptions.MemorySetting = MemorySetting.MemoryPreference;
loadOptions.CheckExcelRestriction = false;


var lightCellsObj = new LightCellsDataHandlerVisitCells(table);
loadOptions.LightCellsDataHandler = lightCellsObj;

var wb = new AsposeCell.Workbook(filePath, loadOptions);

Aspose-DataTable-LightCell.jpg (70.0 KB)

johnson.shi · March 1, 2023, 2:43am

@dshetty123,

For such kind of large template file with large amount of cells, we are afraid it really needs quite amount of time to finish the loading/parsing process. For parsing cells data from the template file, we are afraid we have no good solution to improve the performance much.

However, to investigate the performance issue further, we think you may separate the time costs of loading the workbook and your business logic such as putting data into the data table. Please change your implementation of LightCellsDataHandler to do nothing for those methods such as ProcessCell, ProcessRow, … and record the time cost, it can be taken as the time we need to load/parse all data from the template file. The total time cost(4 minutes) minus it will be the required time by your business logic and maybe you can improve it a bit.

If most time was used by our parsing data process, maybe we need your template file to check whether we can improve the performance for it specially.