Data types from ExportDataTable function if column's first value is null

Hi,

I am trying to read data from Excel worksheet to datatable by identifying the range of data. The ExportDataTable works fine in reading the data.

The issue I have is with the datatype which is being assigned to datatable columns. I went through the documentation of ExportDataTable function and found that for performance reasons the first value of column is used to identify the datatype. So in case the entire column has numbers in it but the first value of the column is null it assigns the column string datatype, whereas I expected some kind of integer datatype.

Here is the code to recreate the said scenario, I have also attached the excel file. (col_4 and col_5 are also expected to be of double datatype like col_6 and col_7.)

        var workBook = new Workbook("TestReading.xlsx");

        var workSheet = workBook.Worksheets[0];
        var range = workSheet.Cells.CreateRange("A1");
        var a = range.CurrentRegion;
        var dt = workSheet.Cells.ExportDataTable(a.FirstRow + 1, a.FirstColumn, a.RowCount, a.ColumnCount, true);

TestReading.zip (12.6 KB)

Can you please help me find a way around it?

@kanbee,

Thanks for the sample file.

Please note that, by default, Aspose.Cells evaluates the data type for the values based on the very first value in the column. If the value is null, it means that the data type of the column is a string, which is reasonable. Additionally, if there are mixed data or values in a column, a string data type will be assigned. For your specific requirements, you can define column types for the fields in your own DataTable and then use the ExportDataTable() method to populate it. Please refer to the sample code using your template XLSX file, which I have tested and works fine:
e.g.
Sample code:

var workBook = new Workbook("g:\\test2\\TestReading.xlsx");

Cells cells = workBook.Worksheets[0].Cells;

//Define a datatable with columns as per your desired data types.
DataTable dt = new DataTable();
dt.Columns.Add("col_1", typeof(string));
dt.Columns.Add("col_2", typeof(string));
dt.Columns.Add("col_3", typeof(string));
dt.Columns.Add("col_4", typeof(double));
dt.Columns.Add("col_5", typeof(double));
dt.Columns.Add("col_6", typeof(double));
dt.Columns.Add("col_7", typeof(double));
dt.Columns.Add("col_8", typeof(double));
dt.Columns.Add("col_9", typeof(double));
//.............

ExportTableOptions options = new ExportTableOptions();
options.SkipErrorValue = true;
options.CheckMixedValueType = false;
options.ExportColumnName = true;            
options.DataTable = dt;

cells.ExportDataTable(1, 0,cells.MaxDataRow +1, cells.MaxDataColumn +1, options);

Worksheet sheet2 = workBook.Worksheets[workBook.Worksheets.Add()];
sheet2.Cells.ImportData(dt, 0, 0, new ImportTableOptions() { ExportCaptionAsFieldName = true });

workBook.Save("g:\\test2\\out1.xlsx");

Hope, this helps a bit.

AsposeDatatypeCompare.zip (5.8 MB)
demo_SS.png (13.8 KB)
I appreciate the given solution but it does not work for me as I do not know the datatypes of columns of tables I am about to read. The excel files I will be reading are variable, so there is no way for me to know the datatypes of columns or the size of these tables beforehand.

Currently I am using ExcelReader library to read excel files and these tables are finally stored in a database, that’s why the data type of columns is important for me. I am attaching a small demo project to display the comparison in column datatypes when read from aspose.cells and when read from ExcelReader.

Is there any way for you to generate the same data types as ExcelReader?

Thanks.

@kanbee
At present, Aspose.Cells does not support data type detection when exporting data to a DataTable.

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-53962

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@kanbee
Though we have supported your need, we will release the next version in the next month.
Now you can init the DataTable as the following :

   public static DataTable InitDataTable(Cells cells, int startRow, int startColumn, int totalRows, int totalColumns, bool exportColumns)
        {
            if (exportColumns)
            {
                startRow++;
                totalRows--;
            }
            CellValueType[] cellValueTypes = new CellValueType[totalColumns];
            for (int j = 0; j < totalColumns; j++)
            {
                cellValueTypes[j] = CellValueType.IsNull;
            }
                for (int i = startRow; i < startRow + totalRows; i++)
            {

                Row rowInfo = cells.CheckRow(i);
               
                if (rowInfo.IsHidden)
                    continue;
                for (int j = 0; j < totalColumns; j++)
                {
                    Cell cell = rowInfo.GetCellOrNull(startColumn + j);
                    if (cell == null)
                    {
                        continue;
                    }
                    if (cellValueTypes[j] == CellValueType.IsString)
                    {
                        continue;
                    }
                    switch(cell.Type)
                    {
                        case CellValueType.IsNull:
                            break;
                       
                        case CellValueType.IsDateTime:
                            if (cellValueTypes[j] != CellValueType.IsNumeric)
                            {
                                cellValueTypes[j] = CellValueType.IsDateTime;
                            }
                            break;
                        case CellValueType.IsNumeric:
                            cellValueTypes[j] = CellValueType.IsNumeric;
                            break;
                        default:
                            cellValueTypes[j] = CellValueType.IsString;
                            break;
                    }
                }

            }
            DataTable dt = new DataTable();
            for (int j = 1; j <= totalColumns; j++)
            {
                dt.Columns.Add(new DataColumn("col_" + j));
                DataColumn dc = dt.Columns[dt.Columns.Count - 1];
                switch (cellValueTypes[j - 1])
                {
                    case CellValueType.IsNull:
                        dc.DataType = typeof(string);
                        break;

                    case CellValueType.IsDateTime:
                        dc.DataType = typeof(DateTime);
                        break;
                    case CellValueType.IsNumeric:
                        dc.DataType = typeof(double);
                        break;
                    default:
                        dc.DataType = typeof(string);
                        break;

                }
            }
           
            return dt;

        }

Then export data to the DataTable with codes in above Amjad’s Post.

Hi,
Thank you for the solution. I am performing this process of reading data on computers with different languages. I tried using LoadOptions like this:
var options = new LoadOptions
{
Region = CountryCode.USA,
LanguageCode = CountryCode.USA
};

But still reading data on a French machine produced data with decimal values like 1.1 formatted like “1,1”. Can you please guide me through this obstacle?

@kanbee,
We have created a sample file. And by setting the machine to French through the regional settings in the control panel, we can obtain the correct results through testing with the following code. Please refer to the attachment (19.8 KB)
.

Workbook wb = new Workbook(filePath + "test.xlsx");

Console.WriteLine(wb.Worksheets[0].Cells["A1"].StringValue);
Console.WriteLine(wb.Worksheets[0].Cells["B1"].StringValue);
Console.WriteLine(wb.Worksheets[0].Cells["C1"].StringValue);

LoadOptions options = new LoadOptions();
options.Region = CountryCode.USA;
options.LanguageCode = CountryCode.USA;

wb = new Workbook(filePath + "test.xlsx", options);

Console.WriteLine(wb.Worksheets[0].Cells["A1"].StringValue);
Console.WriteLine(wb.Worksheets[0].Cells["B1"].StringValue);
Console.WriteLine(wb.Worksheets[0].Cells["C1"].StringValue);

This is odd, I am not getting these results in my application. Maybe it has something to do with ExportDataTable function? Because I am using ExportDataTable function combined with the above InitDataTable function to produce my results.

@kanbee,

Could you please provide sample console application, zip the project with resource files and provide us to reproduce the issue on our end. We will check your issue soon. Also, share the screenshot for environment/regional settings.

PS. please exclude Aspose.Cells.Dll to minimize the size of the zipped archive.

Please find the attached zip file with a demo console app and screenshot for environment/regional settings. I have excluded Aspose.Cells.Dll.AsposeLanguageTest.7z (328.6 KB)

@kanbee
By using the provided sample code for testing, we can reproduce the issue. The formatting error of the number was found after setting the region.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSNET-54069

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The issues you have found earlier (filed as CELLSNET-53962) have been fixed in this update. This message was posted using Bugs notification tool by johnson.shi