Aspose.Cells .Net is slow converting large Html to Excel

I am trying to convert a large Html file to Excel, the Html file has 1 million rows, but the conversion process seems to be slow.

Please refer to this sample Html file:
HtmlToExcel.7z (128.6 KB)

This is my codes in C#:

static void Main(string[] args)
{
	var watch = new System.Diagnostics.Stopwatch();
	watch.Start();
	Convert("<Html file path>");
	watch.Stop();
	Console.WriteLine(watch.Elapsed);
}

private static void Convert(string file)
{
	using (var workbook = new Workbook(file, new HtmlLoadOptions()))
	{
		workbook.Worksheets[0].AutoFitColumns();
		workbook.Save("<Excel file path>"", SaveFormat.Xlsx);
	}
}

I executed above codes 3 times, and each time it spent in average of 9 minutes to complete.
Please help to have a look if

  1. This is some performance issue in Aspose.Cells for .Net?
  2. There are some process that is slow during the conversion, for example conversion of css styling?
  3. Is there any better and faster way to do the conversion for large Html file to Excel?

I am using Aspose.Cells .Net version 21.9.0, .Net Framework 4.6.2
Machine: Windows 10, Intel Core i7, 16GB RAM.

@fonglun,

I checked your HTML file a bit and it is 850MB file in size. There are 1 million records in it. When I opened the file into MS Excel, MS Excel takes long time to open the file into it. When reading and then rendering to XLSX by Aspose.Cells, it will surely consume resources (RAM, CPU etc.) and will take some time to complete the process. Moreover, you have used AutoFitColumns() operation which itself is a time consuming operation when there are huge number of rows and columns.

Could you please comment out or remove the line of code:

workbook.Worksheets[0].AutoFitColumns();

and then give it a try your scenario again if you get better performance now?

Hi,

Thanks for quick response.

I’ve tried not to use the AutoFitColumns(), it is faster but not a lot, execute it few times and the most it is faster by a minute.

I know that the file is big and there are a million rows, but we do have this big data in our application.

@fonglun,

We are afraid, there is no better way for rendering so huge HTML file with big data to XLSX efficiently
But you may try using CSV file format (instead of HTML as input file) along with LightCells APIs in light weight mode to get better performance.

Moreover, we have logged an investigation ticket with an id “CELLSNET-50692” for your issue. We will look into it soon. Once we have an update on it. we will let you know.

@fonglun

We analyzed the top functions during your html file to xlsx. It is indeed beacause of tons of data.

If the data in your source html file is simple text, you can try to disable ConvertDateTimeData and ConvertNumericData in HtmlLoadOptions to get a better performance.

Code:

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.ConvertDateTimeData = false;
htmlLoadOptions.ConvertNumericData = false;

using (var workbook = new Workbook(file, htmlLoadOptions))
{
    ...
}

I’m afraid we cannot use CSV as an input file instead of HTML, there are some styling needed in the generated Excel file, such as text-align, color and bold just to name a few.

@fonglun,

Thanks for your feedback on using CSV file format.

As we told you there is no reliable way to cope with such a long list of data. Anyways, since we already logged an investigation ticket for your issue, so, please spare us little time to evaluate your issue in details.

If we are able to devise some other better way around, we will update you.

@fonglun

Please check my post: