ExportDataTableAsString for big Excel file take 23 min


#1

Hi Aspose team,
We have Aspose cell version 8.8.3 license and we are using it in one our project.
we have to deal with big Excel file, number of rows are 182,478 and number of columns in Excel file are 441. the first row contains the column headers. the Excel file size is 250 MB. we need to get this data into the data table then pass it to another method to process the data. we want to have all the contains of Excel sheet as string in data table.

we are using Cells.ExportDataTableAsString method and it takes about 23 minutes to do that.
is there any other way to this process faster. the time consumption is very important for us.

Thanks a lot.
Hossein


#2

@hosseintaheri,
I have tried to reproduce this issue with a sample file containing similar number of rows and columns and having textual data in it. It took about 2 minutes to load and export it to DataTable as string. You may please share your sample file with us for our testing. You may upload file to some public file sharing server and share the link with us. If your sample file contains private data, then you may share the link via private message to us. For this click on my name icon and then press Message to send private message,

Also as you are using quite an old version, therefore you may test the scenario with the latest version and share feedback with us.


#3

Hi Ahsan,
Thanks a lot for your response, I have created a sample test Excel file (I can not share the original file since its contains confidential information) but with this file I have experienced the same behavior it took more than 23 minutes to converting Excel data sheet into Data Table object.

I compressed this file with 7Zip free tool. Please let me know if you need any more information. Thanks a lot for your help.test.zip (4.9 MB)

Side Note:
Before I open a support ticket for support I did test with latest Aspose cell version (version 19) but I faced with the same result.

Here is my result:
Reading excel file content: 00:00:00.1615382
Converting the worksheet to Data table: 00:24:01.4917705


#4

@hosseintaheri,

I have tried your sample file which takes around 10 minutes on my system to accomplish the task. You may please try the MemorySetting.MemoryPreference in the LoadOptions and test the scenario again. It took about 2 minutes to finish this task with the following sample code:

Aspose.Cells.LoadOptions opts = new Aspose.Cells.LoadOptions();
opts.MemorySetting = MemorySetting.MemoryPreference;
Workbook wb = new Workbook(@"test.xlsx",opts);
DataTable dt = wb.Worksheets[0].Cells.ExportDataTableAsString(0, 0, 181873, 441);

#5

Hi Ahsan,
I used Memory Setting but the result was the same.

I have used the following sample code for my test:

					var startTime = DateTime.Now;
					byte[] file = System.IO.File.ReadAllBytes("E:\\test1.xlsx");
					var finishTime = DateTime.Now;
					Console.WriteLine("reading the excel file: " + (finishTime - startTime));

                    var mimeType = MimeMapping.GetMimeMapping("c:\\test1.xlsx");

                    // The current culture will affect the data formats produced by Aspose.Cells, such as date formats.
                    // Set the culture explicitly to ensure consistency
                    System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture;

                    // Instantiate LoadOptions specified by the LoadFormat.
                    var loadOptions = new Aspose.Cells.LoadOptions(LoadFormat.Auto);

                    switch (mimeType)
                    {
                        case "application/octet-stream":
                            loadOptions = new TxtLoadOptions(LoadFormat.CSV)
                            {
                                PreferredParsers = new ICustomParser[]
                               {
                                   new Utilities.AsposeCustomParser()
                               }
                            };
                            break;

                        case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":
                            loadOptions = new Aspose.Cells.LoadOptions(LoadFormat.Xlsx);
                            break;

                        case "application/vnd.ms-excel.sheet.binary.macroEnabled.12":
                            loadOptions = new Aspose.Cells.LoadOptions(LoadFormat.Xlsb);
                            break;

                        case "application/vnd.ms-excel":
                        case "application/vnd.ms-excel.sheet.macroEnabled.12":
                        case "application/vnd.ms-excel.template.macroEnabled.12":
                        case "application/vnd.openxmlformats-officedocument.spreadsheetml.template":
                            loadOptions = new Aspose.Cells.LoadOptions(LoadFormat.Excel97To2003);
                            break;

                        case "text/plain":
                            loadOptions = new TxtLoadOptions(LoadFormat.CSV)
                            {
                                PreferredParsers = new ICustomParser[]
                                {
                                    new Utilities.AsposeCustomParser()
                                }
                            };

                            ((TxtLoadOptions)loadOptions).Separator =
                                string.IsNullOrEmpty(Utilities.Utilities.GetAppCustomSetting(Utilities.Constants.TextFileColumnSeparatorKey))
                                    ? Utilities.Constants.CommaSeparator
                                    : Utilities.Utilities.GetAppCustomSetting(Utilities.Constants.TextFileColumnSeparatorKey).ToCharArray().First();
                            break;
                    }

                    startTime = DateTime.Now;
                    Workbook workbook = null;
                    loadOptions.MemorySetting = MemorySetting.MemoryPreference;
                    using (Stream stream = new MemoryStream(file))
                    {
                        workbook = new Workbook(stream, loadOptions);
                    }

                    var worksheet = workbook.Worksheets[0];
                    finishTime = DateTime.Now;
                    Console.WriteLine("load data into Excel worksheet: (Aspose)" + (finishTime - startTime));

                    startTime = DateTime.Now;
                    DataTable dataTable = worksheet.Cells.ExportDataTableAsString(
                        0,
                        0,
                        worksheet.Cells.MaxDataRow + 1,
                        worksheet.Cells.MaxDataColumn + 1,
                        true);
                    finishTime = DateTime.Now;
                    Console.WriteLine("load Excel worksheet data into Data table: (Aspose) " + (finishTime - startTime));

here is the result:
reading the excel file from disk: 00:00:00.1177096
load data into Excel worksheet: (Aspose)00:00:47.9377278
load Excel worksheet data into Data Table: (Aspose) 00:21:49.1452347 (Around 22 Min)

I have also use the following test code using OleDb and result was around 2.5 Min.

        startTime = DateTime.Now;

        System.IO.File.WriteAllBytes("E:\\template.xlsx", file);

        finishTime = DateTime.Now;
        Console.WriteLine("Write file to disk: " + (finishTime - startTime));

        startTime = DateTime.Now;

        var dt = new DataTable();

        var fileName = "e:\\test1.xlsx";
        var query = "SELECT * FROM [Data$]";
        using (OleDbConnection cn = new OleDbConnection { ConnectionString = this.ConnectionString(fileName, "No") })
        {
            using (OleDbCommand cmd = new OleDbCommand { CommandText = query, Connection = cn })
            {
                cn.Open();

                OleDbDataReader dr = cmd.ExecuteReader();
                dt.Load(dr);
            }
        }

Here is the result:

reading the excel file from disk: 00:00:00.1177096
Write file to disk: 00:00:01.8624381
Converting the worksheet data to Data Table: (Using OleDb)00:02:11.5379445 (Including read file from disk and doing the conversion) Around (2.5) Min

Ahsan, do you have any idea if you guys can improve Aspose cell to do this conversion faster? I really do not want to modify our project since we have time pressure to make our project available on Production and I really prefer to use exciting code with Aspose, but I need a conversion time less then 2.5 Min.


#6

@hosseintaheri,

I tried to execute your sample code but could not resolve “Utilities” object. Please provide us a complete executable solution for our testing. As I am not able to reproduce this scenario using MemoryPreference(which took 2 minutes in my testing), therefore a running solution is must required to log an investigation ticket and a solution(if possible).


#7

Hi Ahsan,
The reference to the “Utilities” object was used for dealing with CSV file. I removed those, please use the following sample code:

                   var mimeType = MimeMapping.GetMimeMapping("c:\\test1.xlsx");

                    System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture;

                    // Instantiate LoadOptions specified by the LoadFormat.
                    var loadOptions =  new Aspose.Cells.LoadOptions(LoadFormat.Xlsx);

                    startTime = DateTime.Now;
                    Workbook workbook = null;
                    loadOptions.MemorySetting = MemorySetting.MemoryPreference;
                    using (Stream stream = new MemoryStream(file))
                    {
                        workbook = new Workbook(stream, loadOptions);
                    }

                    var worksheet = workbook.Worksheets[0];
                    finishTime = DateTime.Now;
                    Console.WriteLine("load data into Excel worksheet: (Aspose)" + (finishTime - startTime));

                    startTime = DateTime.Now;
                    DataTable dataTable = worksheet.Cells.ExportDataTableAsString(
                        0,
                        0,
                        worksheet.Cells.MaxDataRow + 1,
                        worksheet.Cells.MaxDataColumn + 1,
                        true);
                    finishTime = DateTime.Now;
                    Console.WriteLine("load Excel worksheet data into Data table: (Aspose) " + (finishTime - startTime));

Thanks,
Hossein


#8

@hosseintaheri,

We were able to observe the issue but we need to look into it more. We have logged the issue in our database for investigation and for a fix(if possible). Once, we will have some news for you, we will update you in this topic.

This issue has been logged as

CELLSNET-46684 - Slow performance while using ExportDataTableAsString() for large Excel file

#9

@hosseintaheri,

Please use LightCells API to read data and export it to datatable. Remove all code about Aspose.Cells to export data to datatable, it takes long time to export.

See following code:

DateTime startTime = DateTime.Now;
DateTime finishTime = DateTime.Now;
int columns = 442;
int rows = 181872;
DataTable dt = new DataTable();
string[] values = new string[columns];
for (int i = 0; i < columns; i++)
{
    values[i] = "test" + i;
    dt.Columns.Add(values[i]);
}
for (int row = 0; row < rows; row++)
{
    DataRow dataRow = dt.NewRow();
    dt.Rows.Add(dataRow);
    for (int col = 0; col < columns; col++)
    {
        dataRow[col] = values[col];
    }
}
finishTime = DateTime.Now;
Console.WriteLine("load Excel worksheet data into Data table: (Aspose) " + (finishTime - startTime));

Let us know your feedback.