ExportDataTable fails on Excel document with string data

Loading the attached .xls with the following code produces the error "The value of the cell A4 should not be a string value" in Aspose.Cells version 7.7.2.0.

The column in question starts with several dates, but the 4th row has a string value, which represents a date in a slightly different format.

I did discover that with the addition of the highlighted line, that I could suppress the error. However, I wanted to report that this behavior is different than the version of Aspose which we are upgrading from, because it breaks a datatype inference feature of our application. I will likely be able to work around the issue by adding additional parsing code in areas of the application that formerly did not need it. I still just want to get your take on whether this exception is a bug, or an intentional breaking change. In version 4.8.2.0 we were able to get columns with mixed types of dates and strings.

private DataTable GetExcelWorksheetData(string fileName)
{
if (string.IsNullOrEmpty(fileName))
throw new ArgumentNullException("fileName");

Workbook workBook = new Workbook(fileName);
Worksheet workSheet = workBook.Worksheets[0]; // Assumption is that we are working with one worksheet
DataTable dataTable = new DataTable();
ExportTableOptions options = new ExportTableOptions();
options.ExportColumnName = true;
options.IsVertical = true;
options.ExportAsString = true;
dataTable = workSheet.Cells.ExportDataTable(0, 0, workSheet.Cells.MaxDataRow + 1, workSheet.Cells.MaxColumn + 1, options);
return dataTable;
}

Seems related to the following thread, though I’m dealing with xls rather than csv:


Loading .CSV from File Fails on Column with Number then Text

Hi,

Thanks for your posting and using Aspose.Cells.

It is actually a correct behavior. When user does not specify the data type of columns of the datatable, we
have to determine it by the first data in this column(for performance
consideration, we cannot traverse all cells to check whether there are multiple
data types in one column).

For your situation, we think you can manipulate the
process of determining data types by yourself and then export data to the
datatable. If you know data types of those columns in your file, you can just
define column type of the datatable with constant value. Otherwise you can check
all cells or cells of first row(depends on your business requirement) to
determine the data types.

Code Example:


ExportTableOptions etOpt = new ExportTableOptions();

etOpt.ExportColumnName = true;

DataTable dt = new DataTable();

dt.Columns.Add().DataType = typeof(string);

dt.Columns.Add().DataType = typeof(double);

etOpt.DataTable = dt;

cells.ExportDataTable(0, 0, 3, 2, etOpt);

Yes, I’m sure that I can do some coding around this issue. It is annoying behavior difference for us from your older version. We do not know at compile time what column data types might be. We have to determine them at run time, and we used to be able to do so by analyzing the exported data table’s cell data types to determine the majority data type for a certain column. Now we will have to do our own parsing of every cell. The data type inference feature in our application is a nice feature for our customers which have to import data from disparate systems with inconsistent format masks, etc. Sometimes they just have odd data, and we have to be as accommodating as possible.


Thank you, however, for confirming this is desired behavior. I do understand how critical performance is to your API. Now I just have to work around it.

Hi Danny,

Thanks for your understanding and using Aspose.Cells.

When there are multiple types of data in a column, then string data type is most suitable. Hopefully, you will be able to find the workaround of your problem as this is performance critical problem. Let us know if you encounter any other issue, we will be glad to look into it and help you further.

For those who come across this thread with the same problem, the following code maybe helpful to you. Microsoft’s DataTable class is actually capable of loading string data into most data type columns perfectly well… the error is only from Aspose code that is restricting the typing instead of automatically attempting to convert it for you. This code sample will use the first row of cells to determine what the data type should be (you might want to tweak this to fit your own needs), exports as strings to avoid Aspose’s error, then clones and reimports the data into a second DataTable.


private DataTable ExportDataTableWithMixedDataTypes(Worksheet worksheet, ExportTableOptions options)
{
options.ExportAsString = true;
DataTable dataTable = worksheet.Cells.ExportDataTable(0, 0, worksheet.Cells.MaxDataRow + 1,
worksheet.Cells.MaxColumn + 1, options);
DataTable typedTable = dataTable.Clone();
<span style="color:blue;">for</span> (<span style="color:blue;">int</span> i = 0; i < worksheet.Cells.MaxDataColumn + 1; i++)
{
	<span style="color:blue;">switch</span> (worksheet.Cells[options.ExportColumnName ? 1:0, i].Type)
	{
		<span style="color:blue;">case</span> <span style="color:#2b91af;">CellValueType</span>.IsBool:
			typedTable.Columns[i].DataType = <span style="color:blue;">typeof</span>(<span style="color:blue;">bool</span>);
			<span style="color:blue;">break</span>;
		<span style="color:blue;">case</span> <span style="color:#2b91af;">CellValueType</span>.IsDateTime:
			typedTable.Columns[i].DataType = <span style="color:blue;">typeof</span>(<span style="color:#2b91af;">DateTime</span>);
			<span style="color:blue;">break</span>;
		<span style="color:blue;">case</span> <span style="color:#2b91af;">CellValueType</span>.IsNumeric:
			typedTable.Columns[i].DataType = <span style="color:blue;">typeof</span>(<span style="color:blue;">double</span>);
			<span style="color:blue;">break</span>;
		<span style="color:blue;">default</span>:
			typedTable.Columns[i].DataType = <span style="color:blue;">typeof</span>(<span style="color:blue;">string</span>);
			<span style="color:blue;">break</span>;
	}
}

<span style="color:blue;">foreach</span> (<span style="color:#2b91af;">DataRow</span> row <span style="color:blue;">in</span> dataTable.Rows)
{
	typedTable.ImportRow(row);
}

<span style="color:blue;">return</span> typedTable;

}

This worked well for most cases for me. I did end up using a different technique in my final implementation, because Aspose can handle currency symbols as numeric data, while DataTable could not. So in my final implementation I scanned ALL the cells and did a convert of the values if they did not match the first row’s datatype, prior to calling ExportDataTable.

Hi Danny,

Thanks for sharing the workaround and using Aspose.Cells.

It will surely be helpful for other users. Let us know if you face any other issue, we will be glad to look into it and help you further.