Pre-Sales Question - Import SpreadsheetML into SQL Server

Hello,

I see from this post

My question is: I have some VERY large XML files (spreadsheetML) that can be from 50MB to 20GB in size. Do you know how Aspose.Cells will perform with these large files? I will be importing the data into SQL Server.
Does it (can it) use the SQL Bulk loader?
Any suggestions on the best way to do this?

Thanks,

Mike = :slight_smile:

Hi Mike,

Thanks for your inquiry.

Yes, Aspose.Cells can handle (import / export) large files. Since your SpreadsheetML files are very huge (50MB-20GB) sized, so, it does require a lot of memory to work with them and the process would be slow too. I think you may give a try our latest version(4.6.0): http://www.aspose.com/community/files/51/file-format-components/aspose.cells/entry156149.aspx in which we have made some enhancements regarding performance too.

Alternatively, I think you may try to split a big sized SpreasheetML file to make several files, thus it will reduce the size to enhance the performance in the long run.

Thank you.

Thank you. I will give it a try. I also requested a temp license to make sure I have no limitations.

Also, do you have sample code somewhere to make my eval faster?

Thanks,

Mike =:-)

I found some code.

Workbook objWorkbook = new Workbook();

objWorkbook.Open("XLSTest3.xml", FileFormatType.SpreadsheetML);

Worksheet objWorksheet = objWorkbook.Worksheets[0];

DataTable dt = new DataTable();

dt = objWorksheet.Cells.ExportDataTableAsString(0, 0, objWorksheet.Cells.MaxRow + 1, objWorksheet.Cells.MaxColumn + 1, true);

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnectionString))
{

bulkCopy.DestinationTableName = "XLSTest3";

//* Set up the column mappings *
bulkCopy.ColumnMappings.Clear();

bulkCopy.ColumnMappings.Add("First Name", "First Name");

bulkCopy.ColumnMappings.Add("Last Name", "Last Name");

bulkCopy.ColumnMappings.Add("Address", "Address");

bulkCopy.ColumnMappings.Add("Phone", "Phone");

bulkCopy.ColumnMappings.Add("DOB", "DOB");

//* Write from the source to the destination *
bulkCopy.WriteToServer(dt);

}

This worked fine with my test SpreadsheetML that I created myself in Excel using Save As “XML Spreadsheet”.

However, the files I will be using are SpreadsheetML files that I have no control over what is in them. They are for sure SpreadsheetML and open fine in Excel, but one test file using the code above, gives me:

“Error in opening SpreadsheetML file.”

“Invalid formula in cell @0: =‘test export’!R4C1:R4C435”

The offending XML looks like this:



<NamedRange ss:Name=“_FilterDatabase”

ss:RefersTo=“=‘test export’!R4C1:R4C435” ss:Hidden=“1”/>

...

Can’t we just ignore “formula errors”?

When I manually fix that cell, it loads the Worksheet, but then this line:

dt = objWorksheet.Cells.ExportDataTableAsString(0, 0, objWorksheet.Cells.MaxRow + 1, objWorksheet.Cells.MaxColumn + 1, true);

give me the error “ColumnName is required when it is part of a DataTable.". I assume this is a DataTable error and not really your product.

So, what is really the best way to load SpreadsheetML into SQL Server?

Thanks,

Mike =:-)

Hi Mike,

Could you post your template SpreadsheetML file here, we will check your issue soon.

Thank you.

Sure, I can post a sample. It is kind of proprietary and intellectual property. Is there a way to send it so that only Aspose can see or download it?

BTW, looking at the code I posted, it seems that it will load the entire XML file into memory (objWorksheet, then a DataTable). I'm concerned that this is inefficient for my very large files of 50MB or 500MB. Is this the best way with your product?

Thanks,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Mike =:-)

Hi,

You can make this thread as private by checking “Keep this post as private” checkbox at the bottom, thus only Aspose staff and you, would be able to see your attachments.

Alternatively, you may post your template file(s) directly to us. Since your file contains confidential data, you can email to us, so we can resolve your issue asap. To email please follow these under mentioned steps,

1: click the Contact button in the Post.

2: In the drop down list options click "Send nausherwan.aslam an Email”.

3: Attach the template file and send it.

4: Once you have done it, kindly confirm us on this thread.

Thank You & Best Regards,

Ok, I emailed you the file.

Thank you.

Hi,

Thank you for considering Aspose.

We have received your template file. We will check your issue and get back to you soon.

Thank You & Best Regards,

Hi,

Thank you for considering Aspose.

Well, your provided file may be corrupt as MS Excel could not open it. So please provide us more details about how to create this file or send the file again after checking it that it can be properly opened using MS Excel.

Regarding the error of opening the file with Aspose.Cells, we find invalid named range in the following xml section.

<Worksheet ss:Name="test export">

<Names>

<NamedRange ss:Name="_FilterDatabase"

ss:RefersTo="='test export'!R4C1:R4C435" ss:Hidden="1"/>

</Names>

The max column is 255 in the SpreadML and the max column index in the formula is 434, so Aspose.Cells will throw such Exception.

Thank You & Best Regards,

Thank you for your reply.

I am able to open the file with no problem in Excel 2007. I did notice that some of my files have this as line 2:

<?mso-application progid="Excel.Sheet"?>

I’m not sure if that makes a difference for you.

As for the formula error, is there a way to ignore formula errors on .Open ?

Regarding the max column, I believe the limit has been increased in Excel 12. See this link: http://blogs.msdn.com/excel/archive/2005/09/26/474258.aspx

Does Aspose support the Office 2007 XML Schemas?

Thanks,

Mike =:-)

Hi,

Thank you for considering Aspose.

Please try the attached latest version of Aspose.Cells. We will ignore this exception in the latest fix.

Thank You & Best Regards,

Hello,

Sorry for the month of no response, we were waiting for MS on another direction with these files. Still waiting


Anyway, I tried the DLL version (4.6.0.3) that you attached here. Now, I get this error:

Message:

“Line 34: Invalid value "8.0"”

StackTrace:

at Aspose.Cells.Converter..(XmlTextReader უ, Style Զ)\r\n at Aspose.Cells.Converter..(XmlTextReader უ)\r\n at Aspose.Cells.Converter..(XmlTextReader უ)\r\n at Aspose.Cells.Converter..(XmlTextReader უ)\r\n at Aspose.Cells.Converter..Import(Stream src, Workbook excel)\r\n at Aspose.Cells.Workbook.Open(String fileName, FileFormatType type)

I’m using the same file I sent you before. Can you see if you can load it? I emailed it to you again.

Thanks,

Mike =:-)

Hi Mike,

Thanks for your feedback,

we will look into your issue soon.

Thank you.

Hi Mike,

Thank you for considering Aspose.

Well, I have tried to load the data from your SpreadsheetML to Datatable using Cells.ExportDataTable() method with the latest Aspose.Cells fix (V4.6.0.9) and all the data gets loaded properly. Please try loading and bulk coping you SpreadsheetML file by using the attached latest fix. If you face any problem, please let us know.

Thank You & Best Regards,

Thank you, it loaded ok this time. But I’m getting that same error from earlier in this thread:

“ColumnName is required when it is part of a DataTable”

Here’s my code. Do you see where I might be going wrong? Do I need to loop through the DataTable and create Column Names?

Workbook objWorkbook = new Workbook();

objWorkbook.Open("MMM_Clean.xml", FileFormatType.SpreadsheetML);

Worksheet objWorksheet = objWorkbook.Worksheets[0];

DataTable dt = new DataTable();

dt = objWorksheet.Cells.ExportDataTable(0, 0, objWorksheet.Cells.MaxRow + 1, objWorksheet.Cells.MaxColumn + 1, true);

Thanks,

Mike =:-)

Hi Mike,

Thank you for considering Aspose.

Please make sure that first row of you SpreadsheetML file contains the column headers and then update your Cells.ExportDataTable() method as under,

dt = objWorksheet.Cells.ExportDataTable(0, 0, objWorksheet.Cells.MaxDataRow + 1, objWorksheet.Cells.MaxDataColumn + 1, true);

This will only fetch the Maximum Rows & Columns having data in them, so you will not face the ColumnName issue any more.

Please do let us know if you still face any problem or have any confusion.

Thank You & Best Regards,

Hi Mike,

I think, alternatively, you may define your own columns in a datatable to fill it with the worksheet data for your need, see the following codes.

e.g..,

Workbook objWorkbook = new Workbook();
objWorkbook.Open("f:\\test\\spreadml\\MMM_Clean.xml", FileFormatType.SpreadsheetML);
Worksheet objWorksheet = objWorkbook.Worksheets[0];
DataTable dt = new DataTable();
for (int i = 1; i <= objWorksheet.Cells.MaxColumn; i++)
{
dt.Columns.Add("column" + i, typeof(string));
}

objWorksheet.Cells.ExportDataTable(dt,0, 0, objWorksheet.Cells.MaxRow + 1,true);
Thank you.

Hello,

Yes, that worked, my column headers are in Row 4, so for testing I used:

dt = objWorksheet.Cells.ExportDataTable(3, 0, objWorksheet.Cells.MaxRow + 1, objWorksheet.Cells.MaxColumn + 1, true);

Thanks,

Mike =:-)

I thought I’d update this thread with some test results. My test SpreadsheetML XML file is 730MB. To my surprise (Kudos to Aspose), the following code did not fail with a “System.OutOfMemoryException”:

Workbook objWorkbook = new Workbook();

objWorkbook.Open(“BigFile.xml”, FileFormatType.SpreadsheetML); //* Open SpreadsheetML File *

Worksheet objWorksheet = objWorkbook.Worksheets[0]; //* Get First Worksheet *

I say to my surprise because with test code to load this big file using XPathDocument with a XmlReader did fail with this error. The following code failed with a “System.OutOfMemoryException”:

XPathDocument doc = new XPathDocument(objXMLReader); //* this was just a test and has nothing to do with the Aspose code *

So, Aspose must be doing something to prevent the out of memory error in their .Open() method. However, it’s not all good news. When I try the following code to export the Workbook data to a DataTable, I do get the “System.OutOfMemoryException” on the .Cells.ExportDataTable() method:

DataTable dt = new DataTable();

dt = objWorksheet.Cells.ExportDataTable(3, 0, objWorksheet.Cells.MaxRow + 1, objWorksheet.Cells.MaxColumn + 1, true);

//* the above line fails with System.OutOfMemoryException *

The whole reason I’m trying to get it to a DataTable is so I can use the:

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnectionString))

{

}

to bulk load this data into SQL Server.

Any experts out there with advice or tips are certainly welcome to comment.

Thanks,

Mike =:-)