Performance Issues in exporting html records to excel


#1

Hi,

I’m using Aspose.cells for .NET and while exporting html records to excel, I’m facing performance issues. The export takes considerable amount of time when lacs of records are present. Can you suggest as to how can we improve the performance specifically for html columns from the dataset?

Here is a sample code:

    var workbook = new Aspose.Cells.Workbook(@"C:\temp\Book1.xlsx");            
        Worksheet worksheet = workbook.Worksheets["SrchRslt"];            
        
        for(int i=0;i<100000;i++)
        {                
            Cell cell = worksheet.Cells["C"+(i+2).ToString()];
            cell.HtmlString = "<Font Style=\"FONT-WEIGHT: bold;FONT-STYLE: italic;TEXT-DECORATION: underline;FONT-FAMILY: Arial;FONT-SIZE: 11pt;COLOR: #ff0000;\">This is simple HTML formatted text.</Font>";                
        }            
        workbook.Save(@"C:\temp\Book1.xlsx");

#2

@pratom01,

Thanks for the sample code segment and details.

Well, the process may take time if you are inserting large list of HTML strings into cells in manual way. Also, most time will be spent for the parsing process of the given html string. I think you may consider using LightCells APIs for imputing/writing your HTML strings, it may minimize the time cost and enhance the performance. See the document on using LightCells APIs for your reference:
https://docs.aspose.com/display/cellsnet/Using+LightCells+API


#3

Thanks for the suggestion. Although, I tried the API and unfortunately, couldn’t see any significant improvement in the performance. The time taken was almost the same as in the previous case when I wasn’t using LightCells API.

Is there anything else I could do to have a better performance with the columns containing HTML content in the db?

Here is the sample code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using Aspose.Cells;
using System.Data;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{

        Console.WriteLine(DateTime.Now.ToString());
        var workbook = new Aspose.Cells.Workbook(@"C:\temp\Book1.xlsx");            
        Worksheet worksheet = workbook.Worksheets["SrchRslt"];          

        int rowsCount = 100000;
        int colsCount = 1;
        
        var ooxmlSaveOptions = new OoxmlSaveOptions();

        ooxmlSaveOptions.LightCellsDataProvider = new TestDataProvider(workbook, rowsCount, colsCount);

        workbook.Save(@"C:\temp\Book1.xlsx", ooxmlSaveOptions);
        Console.WriteLine(DateTime.Now.ToString());
        Console.ReadLine();
    }
}
class TestDataProvider : LightCellsDataProvider
{
    private int _row = -1;
    private int _column = -1;

    private int maxRows;
    private int maxColumns;

    private Workbook _workbook;
    public TestDataProvider(Workbook workbook, int maxRows, int maxColumns)
    {
        this._workbook = workbook;
        this.maxRows = maxRows;
        this.maxColumns = maxColumns;
    }

    #region LightCellsDataProvider Members
    public bool IsGatherString()
    {
        return false;
    }

    public int NextCell()
    {
        ++_column;
        if (_column < this.maxColumns)
            return _column;
        else
        {
            _column = -1;
            return -1;
        }
    }
    public int NextRow()
    {
        ++_row;
        if (_row < this.maxRows)
        {
            _column = -1;
            return _row;
        }
        else
            return -1;
    }

    public void StartCell(Cell cell)
    {
        cell.PutValue(_row + _column);
        if (_row == 1)
        {
        }
        else
        {
            //cell.Formula = "=Rand() + A2";
            cell.HtmlString = "<Font Style=\"FONT-WEIGHT: bold;FONT-STYLE: italic;TEXT-DECORATION: underline;FONT-FAMILY: Arial;FONT-SIZE: 11pt;COLOR: #ff0000;\">This is simple HTML formatted text.</Font>"; 
        }
    }

    public void StartRow(Row row)
    {
    }

    public bool StartSheet(int sheetIndex)
    {
        if (sheetIndex == 0)
        {
            return true;
        }
        else
            return false;
    }
    #endregion
}

}


#4

@pratom01,

Thanks for the sample code.

Seeing your sample code segment, it seems you need to set same rich text (html string) for those cells, so, we think it might be better for performance to set the html string to one cell and then copy it to other cells. This will enhance performance for sure. You should do this and do not necessarily need to use light weight mode.


#5

The html strings may not necessarily be the same always as we’re getting the values from the database. Due to privacy issues, we won’t be able to share the exact code on the forum and that is why I created a sample console application to explain the issue in detail.

Hence, there are columns in our database that contain rich text values in them and we’re trying to export the data to excel which is causing performance issues in our web application.


#6

@pratom01,

Thanks for providing further details.

Well, I am afraid, there is no better way to minimize the time or cope with it as time will surely be spent when parsing html string for each cell (as there would be long list of cells for the process). By the way, I tried your simplest code using v18.11.x and it takes less than 2.64 seconds to parse html strings for each cell in the huge range, i.e., C2: C100001. I guess the performance is not bad at all.
e.g
Sample code:

var workbook = new Aspose.Cells.Workbook();
        Worksheet worksheet = workbook.Worksheets["Sheet1"];            
        
        Stopwatch dt = new Stopwatch();
        dt.Start();
        for(int i=0;i<100000;i++)
        {                
            Cell cell = worksheet.Cells["C"+(i+2).ToString()];
            cell.HtmlString = "<Font Style=\"FONT-WEIGHT: bold;FONT-STYLE: italic;TEXT-DECORATION: underline;FONT-FAMILY: Arial;FONT-SIZE: 11pt;COLOR: #ff0000;\">This is simple HTML formatted text.</Font>";                
        }
        workbook.Save(@"e:\test2\out1.xlsx");
        dt.Stop();
        Console.WriteLine(dt.Elapsed.TotalSeconds);

If you still think it is not good performance, you may consider using Smart Markers feature for html string parsing. As your html strings comes from some data source/ database so you may import those html strings into your template file (by adding those markers) by processing the markers, see the document for your complete reference:
https://docs.aspose.com/display/cellsnet/Using+Smart+Markers
(especially check the example: https://docs.aspose.com/display/cellsnet/Using+Smart+Markers#UsingSmartMarkers-UsingHTMLpropertyofSmartMarkers). Please try it and if it makes any difference.

Hope, this helps a bit.


#7

Thanks for sharing another approach. Unfortunately, this wasn’t of much help to us also. But please let us know if there are any other improvements that may happen in near future with Aspose.Cells so that we can incorporate them in our application in order to improve the performance.


#8

@pratom01,

I am afraid there is no better way to improve the performance for such a scenario. We do not think if we can improve the performance for parsing html string much in near future. How about building an html file by the html strings for cells and then using Aspose.Cells component to load it as workbook? If this is a bit feasible to you, please try it and it might enhance the performance a bit.

Moreover, as you have some fixed format for the input values (html string), we think the best way is you implements your own logic for parsing the htmls.