Slow XLSX output

Hi!


I am trying to output large quantities of data in a single worksheet via Aspose.Cells. The data is mostly strings and numbers without any formatting (yet). I happen to come across two troubling facts:
  1. When outputting the values to XLS file format, the file generation is very fast - around 10sec. However, when the same amount of data is stored in XLSX format, the same amount of data is saved in around 2 minutes. Why is there such discrepancy?
  2. I have also compared two versions of Aspose.Cells: 4.8.2.0 with 7.2.2.0, but with greater amount of data (more than 65536 rows). The older version is outputting XLSX almost two times faster than the new one.
Tests were done on Windows 7 with search indexing feature turned off.

Because of the data limit of XLS, XLSX is preferable, but the time to save such file is too long.

Could you shed some light on what could be the reason for these differences and what could be done to speed things up?

Regards

Tomasz Męciński

Hi,


Thank you for using Aspose.Cells.

65536 rows of data is too much a data to store, but since you have compared the old version with the latest released version of Aspose.Cells, we would request you to provide us your sample source excel file so that we can look into it and investigate it further. This would be interesting for us as well to know about it, specially when the Aspose.Cells 4.8.2.0 is saving the data 2 times faster than the new version.

Hi,


If you simply writing large data in an Excel file (e.g XLSX), the process may demand certain amount of memory and it may take some time to complete the process.

I think you may try to use LightCells APIs if it fits your requirements. Please see the topic on using LightCells APIs although the topic is added for Aspose.Cells for Java but you may use the relevant .NET APIs of Aspose.Cells for .NET:
http://www.aspose.com/docs/display/cellsjava/Using+LightCells+API



You may simply download the test project for LightCells APIs. I have also attached the code here.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Aspose.Cells;

namespace LightOoxmlTest
{
class Program
{
static void Main(string[] args)
{
var workbook = new Workbook();
var ooxmlSaveOptions = new OoxmlSaveOptions();
ooxmlSaveOptions.LightCellsDataProvider = new TestDataProvider(workbook);

workbook.Save("output.xlsx", ooxmlSaveOptions);
}
}

class TestDataProvider : LightCellsDataProvider
{
private int _row = -1;
private int _column = -1;

private Workbook _workbook;
public TestDataProvider(Workbook workbook)
{
_workbook = workbook;
}

#region LightCellsDataProvider Members

public bool IsGatherString()
{
return false;
}

//Just generating a dummy 1000x30 matrix.
public int NextCell()
{
++_column;
if (_column < 30)
return _column;
else
{
_column = -1;
return -1;
}
}
public int NextRow()
{
++_row;
if (_row < 1000)
{
_column = -1;
return _row;
}
else
return -1;
}

public void StartCell(Cell cell)
{
cell.PutValue(_row + _column);
if (_row == 1)
{
}
else
{
cell.Formula = "=Rand() + A2";
}
}

public void StartRow(Row row)
{
}

public bool StartSheet(int sheetIndex)
{
if (sheetIndex == 0)
{
return true;
}
else
return false;
}

#endregion
}
}


Thank you.

Did you say that Aspose.Cells does not officially support data with more than 65536 rows?

While I know XLS file format cannot store more than 65536 rows, XLSX can.

I will try to use the lightcells approach. I didn’t know this functionality is working for .NET. Thanks!

I am attaching short test code which is working at my machine around 6 and 10 minutes for the versions of library mentioned in previous post (and around 5GB of memory - the program needs to be compiled for x64 platform).

using System;
using System.Collections.Generic;
using System.IO;
using Aspose.Cells;


namespace AsposeCellTests
{
class Program
{
const int Columns = 300;
const int Rows = 150000;

static void Main(string[] args)
{
var wb = new Workbook();

var ws = wb.Worksheets[0];
var random = new Random();

for (int i = 0; i < Rows; i++)
{
for (int j = 0; j < Columns; j++)
{
if (j < 50) ws.Cells[i, j].PutValue(GetRandomString(random));
else if (j < 60) ws.Cells[i, j].PutValue((decimal)random.NextDouble());
else ws.Cells[i, j].PutValue(random.Next());
}
}

wb.Save(@“c:\file.xlsx”, FileFormatType.Excel2007Xlsx);
}


public static List Cache = new List();
public static string GetRandomString(Random random)
{
//if(Cache.Count<10000)
//{
var ret = RandomString();
Cache.Add(ret);
return ret;
//}
//return Cache[random.Next(0, 9999)];
}

private static string RandomString()
{
string path = Path.GetRandomFileName();
path = path.Replace(“.”, “”); // Remove period.
return path;
}
}
}

Hi,


I didn’t mean that Aspose.Cells doesn’t support such a large amount of data. Infact, as mentioned by Amjad, huge amount of data may consume more memory and, hence, demanding more time. Please try the LightCells API as suggested by Amjad, and let us know your feedback.

The first tries of LightCells API are very promising. Thanks!


One question: Is it possible to mix “LightCells worksheets” with “normal worksheets”? I would like to create a chart beside this huge amount of data: would this be possible?

Hi,

If you have created the chart before calling Workbook.save() and out of implementation of LightCellsDataProvider, the chart can be saved too for the generated file saved with LightCells mode.

For example, you can create the chart at “sheet1” firstly, and then fill data for Sheet2 by LightCells APIs.

But please make sure do not create charts or any other objects except cell data in the implementation of LightCellsDataProvider, otherwise unexpected result may be created.

Thank you for all your help. Everything seems to work.


One thing is missing at the moment: Can I change style to individual cells/sets of cells while using LightCellsDataProvider?
So far the program keeps throwing “Object reference not set to an instance of an object.” when using cell.SetStyle(…).

Thanks!
Hi,

Thanks for your question and considering Aspose.Cells.

Yes, you can set style to a Cell or Row object in the implementation of LightCellsDataProvider.StartRow()/StartCell(). But please do not modify any other cells/rows than the cell/row that was being processed currently.