Exporting 1 million records to Excel

Hi,


I have updated my previous sample example codes (VB.NET) and enhanced it to accomplish your task. Now, it will fill the huge list of data (1000000 * 3 matrix) into multiple worksheets (3 sheets), each sheet will have this long list of data. By the way, when you add multiple sheets with that kind of huge list of data, the output file would be huge sized and when you will open the output file into MS Excel, it will surely take some time and resources to open the file into it.

Sample code:
Imports System
Imports System.Collections.Generic
Imports System.Text
Imports Aspose.Cells
Imports System.Data


Module Module1


Sub Main()
'Specify your desired matrix
Dim rowsCount As Integer = 1000000
Dim colsCount As Integer = 3
Dim sheetsCount As Integer = 3

Dim workbook = New Workbook()

For k As Integer = 0 To sheetsCount - 1
Dim sheet As Worksheet = Nothing
If k = 0 Then
sheet = workbook.Worksheets(k)
sheet.Name = “test”
Else
Dim sheetIndex As Integer = workbook.Worksheets.Add()
sheet = workbook.Worksheets(sheetIndex)
sheet.Name = “test” & sheetIndex
End If

Next k


Dim ooxmlSaveOptions = New OoxmlSaveOptions()
ooxmlSaveOptions.LightCellsDataProvider = New TestDataProvider(workbook, sheetsCount, rowsCount, colsCount)

workbook.Save(“e:\test2\myoutput_testnew2.xlsx”, ooxmlSaveOptions)
End Sub
End Module


Public Class TestDataProvider
Implements LightCellsDataProvider
Private dt As DataTable
Public Sub FillTable()
dt = New DataTable()
dt.Columns.Add(“Column1”, GetType(Integer))
dt.Columns.Add(“Column2”, GetType(String))
dt.Columns.Add(“Column3”, GetType(Integer))
For i As Integer = 0 To maxRows - 1
dt.Rows.Add(i \ 10 + 1, String.Format(“Item {0}”, i), i)
Next i

End Sub

Private _row As Integer = -1
Private _column As Integer = -1

Private sheeCount As Integer
Private maxRows As Integer
Private maxColumns As Integer

Private _workbook As Workbook
Public Sub New(ByVal workbook As Workbook, ByVal sheetCount As Integer, ByVal maxRows As Integer, ByVal maxColumns As Integer)
Me._workbook = workbook
Me.maxRows = maxRows
Me.maxColumns = maxColumns

FillTable()
Me.sheeCount = sheetCount

End Sub

Public Function IsGatherString() As Boolean Implements LightCellsDataProvider.IsGatherString

Return False
End Function

Public Function NextCell() As Integer Implements LightCellsDataProvider.NextCell

_column += 1
If _column < Me.maxColumns Then
Return _column
Else
_column = -1
Return -1
End If
End Function
Public Function NextRow() As Integer Implements LightCellsDataProvider.NextRow

_row += 1
If _row < Me.maxRows Then
_column = -1
Return _row
Else
Return -1
End If
End Function

Public Sub StartCell(ByVal cell As Cell) Implements LightCellsDataProvider.StartCell

cell.PutValue(dt.Rows(_row)(_column)) '_row + _column);

End Sub

Public Sub StartRow(ByVal row As Row) Implements LightCellsDataProvider.StartRow


End Sub

Public Function StartSheet(ByVal sheetIndex As Integer) As Boolean Implements LightCellsDataProvider.StartSheet


If sheetIndex < sheeCount Then

'reset row/column index
_row = -1
_column = -1
Return True
End If

Return False

End Function

End Class



Thank you.

I agree, I have environment to open the excel once it is generated. What if i want to pass datatable for each worksheet? Currently my requirement is i have to use different datatable for each worksheet.

Hi,


"What if i want to pass datatable for each worksheet? Currently my requirement is i have to use different datatable for each worksheet."

Well, it is easy to do it. I have updated my previous code and enhanced it to accomplish your task. I have created three data tables using three methods in the class that implements LightCelllsDataProvider interface and filled data into them. Now I used StartSheet() method to call different data table for different sheet accordingly. Please refer to it and create your own sample codes accordingly.

Note: To save my time, I decrease the matrix to 100000 * 3 records per each sheet.

Sample code:

Imports System
Imports System.Collections.Generic
Imports System.Text
Imports Aspose.Cells
Imports System.Data


Module Module1


Sub Main()
'Specify your desired matrix
Dim rowsCount As Integer = 100000
Dim colsCount As Integer = 3
Dim sheetsCount As Integer = 3


Dim workbook = New Workbook()


For k As Integer = 0 To sheetsCount - 1
Dim sheet As Worksheet = Nothing
If k = 0 Then
sheet = workbook.Worksheets(k)
sheet.Name = “test”
Else
Dim sheetIndex As Integer = workbook.Worksheets.Add()
sheet = workbook.Worksheets(sheetIndex)
sheet.Name = “test” & sheetIndex
End If

Next k


Dim ooxmlSaveOptions = New OoxmlSaveOptions()
ooxmlSaveOptions.LightCellsDataProvider = New TestDataProvider(workbook, sheetsCount, rowsCount, colsCount)

workbook.Save(“e:\test2\myoutput_testnew_2.xlsx”, ooxmlSaveOptions)
End Sub
End Module


Public Class TestDataProvider
Implements LightCellsDataProvider
Private dt As DataTable
Public Sub FillTable1()
dt = New DataTable()
dt.Columns.Clear()
dt.Columns.Add(“Column1”, GetType(Integer))
dt.Columns.Add(“Column2”, GetType(String))
dt.Columns.Add(“Column3”, GetType(Integer))
For i As Integer = 0 To maxRows - 1
dt.Rows.Add(i \ 10 + 1, String.Format(“Item {0}”, i), i)
Next i

End Sub
Public Sub FillTable2()
dt = New DataTable()
dt.Columns.Clear()
dt.Columns.Add(ColumnA”, GetType(Integer))
dt.Columns.Add(ColumnB”, GetType(String))
dt.Columns.Add(“ColumnC”, GetType(Integer))
For i As Integer = 0 To maxRows - 1
dt.Rows.Add(i \ 10 + 2, String.Format(“Column {0}”, i), i)
Next i

End Sub
Public Sub FillTable3()
dt = New DataTable()
dt.Columns.Clear()
dt.Columns.Add(“A”, GetType(Integer))
dt.Columns.Add(“B”, GetType(String))
dt.Columns.Add(“C”, GetType(Integer))
For i As Integer = 0 To maxRows - 1
dt.Rows.Add(i \ 10 + 3, String.Format(“My Item {0}”, i), i)
Next i

End Sub
Private _row As Integer = -1
Private _column As Integer = -1

Private sheeCount As Integer
Private maxRows As Integer
Private maxColumns As Integer

Private _workbook As Workbook
Public Sub New(ByVal workbook As Workbook, ByVal sheetCount As Integer, ByVal maxRows As Integer, ByVal maxColumns As Integer)
Me._workbook = workbook
Me.maxRows = maxRows
Me.maxColumns = maxColumns

Me.sheeCount = sheetCount

End Sub

Public Function IsGatherString() As Boolean Implements LightCellsDataProvider.IsGatherString

Return False
End Function

Public Function NextCell() As Integer Implements LightCellsDataProvider.NextCell

_column += 1
If _column < Me.maxColumns Then
Return _column
Else
_column = -1
Return -1
End If
End Function
Public Function NextRow() As Integer Implements LightCellsDataProvider.NextRow

_row += 1
If _row < Me.maxRows Then
_column = -1
Return _row
Else
Return -1
End If
End Function

Public Sub StartCell(ByVal cell As Cell) Implements LightCellsDataProvider.StartCell

cell.PutValue(dt.Rows(_row)(_column)) '_row + _column);

End Sub

Public Sub StartRow(ByVal row As Row) Implements LightCellsDataProvider.StartRow


End Sub

Public Function StartSheet(ByVal sheetIndex As Integer) As Boolean Implements LightCellsDataProvider.StartSheet

If sheetIndex = 0 Then
FillTable1()
'reset row/column index
_row = -1
_column = -1
Return True
ElseIf sheetIndex = 1 Then
FillTable2()
'reset row/column index
_row = -1
_column = -1
Return True
ElseIf sheetIndex = 2 Then
FillTable3()
'reset row/column index
_row = -1
_column = -1
Return True

End If







'If sheetIndex < sheeCount Then
'…

'End If

Return False

End Function

End Class


Thank you.

Hi,

Code that you have sent me will not work grt for me. Because i dont have maxrows and maxcolumns constant. They depend on the dt value. As i mentioned before each worksheet gets data from different dt and each dt has different values in it.

Please find the sample project attached.

Please respond as soon as possible.

Thanks and Regards,

Niveditha

Hi Amjad,

Let me clearly explain you my scenario once again.

1. Fetch data from db and fil the data into dataset - This is going to be big data (in millions)

2. Take a template, and add a workbook to it.

3. Fill workbook0 with simple data that is not related to dataset

4. Now process dataset with some formulae in vb.net code and the resultset will be ready to be dumped into excel.

5. Query dataset using somecondition and you will get a result set in dt:
Example : dt = ds.Table(0).copy() This will go to worksheet1
Now modify worksheet1 to add sum of the data on top of the rows
6. Query the dt to dump data into worksheet 2
Example : dtwk2 = dt.select ("colA='1'","ColB DESC") This will go to worksheet 2

Now modify worksheet1 to add sum of the data on top of the rows
7

8

So on

This is how my project is. So please let me know if this can be accomplished with your code.

Regards,

Niveditha

Hi,


We are working over your issue/query, we will get back to you soon (hopefully tomorrow).

Thanks,

Hi,

We have analysed your requirements and other aspects after looking at your mentioned points. I have discussed your matter with the developers in detail.

Well, if we look at your requirements mentioned in your 8 points. I am afraid, this cannot be accomplished using LightCells mode when you change your data/value or formulas again and again.

I am afraid, it is impossible to implement your requirements in LightCells mode. With LightCells mode, cells will be saved to the final Excel file one by one (row by row and in one row cell by cell sequentially). When one cell has been processed in StartCell(), then it has already been saved to the final file and you cannot edit it any more.

In short, if you need to change some previous cells after filling data into worksheet cells, such as you have filled data into first 10 rows and then, you modify cells in these rows, I am afraid, you cannot use LightCells API here.

‎In short, we recommend you should use normal Aspose.Cells APIs e.g. ImportDataTable() method of Aspose.Cells for .NET etc. instead of LightCells APIs to import your data into your desired worksheets. Here in normal mode, you are free to update any value or formula once imported data to your cells. If there are not much columns list in the data tables/tables, then you may use ImportDataTable() that can fetch data fine and is an efficient method too.

Thanks for your understanding!

Ok. Apart from point that says "Modify worksheet", can we do all the points from 1 to 7 using lates dll?

Hi,


As I told you in my previous reply, you cannot include or change/update any data or formula once dumped or inserted data in LightCells mode.
If you use normal mode (using Aspose.Cells APIs e.g ImportDataTable() etc.), you may accomplish all of your requirements including importing data, updating data or formulas in the sheets again and again.

Thank you.