Generating Excel : "LightCellsData" VS "load CSV and save as excel"

koteswaragunda · December 20, 2024, 1:51pm

Hi,

I have 2 implementations for generating excel.

One is using LightCellsData which is internally calling the csvReader to read the csv file data and convert into excel. And the other approach is, load csv into the workbook and save workbook using workbook options into excel.

Which is better, let us we have huge data of 20 lakh rows. Which gives better performance and good memory consumption.

Professionalize.Discourse · December 20, 2024, 1:52pm

@koteswaragunda

When comparing the two implementations for generating Excel files from CSV data, using LightCellsData versus loading the CSV into a Workbook and saving it, there are several factors to consider, especially when dealing with large datasets like 2 million rows.

LightCellsData Approach: This method is designed for handling large datasets efficiently. It processes data in a streaming manner, which means it reads and writes data in smaller chunks rather than loading the entire dataset into memory at once. This can significantly reduce memory consumption and improve performance when working with very large files.
Loading CSV into Workbook: This approach involves loading the entire CSV file into a Workbook object. While this method is straightforward and allows for easy manipulation of the data within the workbook, it may not be as memory-efficient as the LightCellsData method, especially with large datasets. The entire CSV file is loaded into memory, which can lead to high memory usage and potential performance bottlenecks.

Performance Considerations

Memory Consumption: If memory usage is a critical concern, especially with large datasets, the LightCellsData approach is likely to be more efficient. It minimizes the memory footprint by processing data in a more controlled manner.
Speed: The LightCellsData method may also offer better performance in terms of speed, as it avoids the overhead of loading the entire dataset into memory before processing.

Conclusion

For handling large datasets, especially with 2 million rows, the LightCellsData approach is generally recommended due to its efficient memory usage and potential for better performance. If you are looking for a solution that can scale well with large data, this method would be the better choice.

For further reference, you can check out the following code examples that demonstrate how to load CSV files and save them as Excel files using both methods:

These examples provide insights into the implementation details and can help you make an informed decision based on your specific requirements.

amjad.sahi · December 20, 2024, 2:58pm

@koteswaragunda,

LightCells gives you better performance and efficiency if you read/write large datasets or contents. I am not sure how you are reading CSV file data and then write to Excel file using LightCells APIs, so can’t predict if LightCells will give you better performance or not. Moreover, other approach is simple and straight one, it can become even more efficient if you load/write the file in memory preference mode.

I think you should give a try to both approaches separately and then check which approach is more efficient and gives you better performance. Thus, you may choose the best one for your requirements.

johnson.shi · December 23, 2024, 3:15am

@koteswaragunda

For loading workbook from template file, LightCells improves the performance and decreases the memory cost by importing cell data one by one without keeping the data in cells model. Because you need to save the workbook to excel file, we are afraid you have to keep all cells data in memory, so LightCells can not help for this situation.

However, if you can parse the csv by yourself and make the parsed cell data as source for implementing LightCellsDataProvider(that is, you can use LightCells for generating the excel file), then you should be able to get much better performance than loading the csv into workbook completely and then re-saving it to excel file.

Anyways, MemoryPreference mode should be the much easier way if it can fit your requirement. So, please try it first.

koteswaragunda · December 23, 2024, 4:40am

Thanks all for your recommendations.

Just FYI, below is the what we are following in the LightCellsDataHelper implementation.

In LightCellsDataHelper constructor, we are getting the CSVReader instance. i.e. this.reader = OutputUtils.getCSVReader(fis, delimiter);
In nextRow() override method of the LightCellsDataHelper, Using csvReader object, we are reading each row. i.e. if ((line = reader.readNext()) != null) {

So, did we followed correctly? any suggestions here please. FYI, below is the sample code how it looks,

public LightCellsDataHelper(File file, Long maxRows, Long noOfSheets, ReportProcessHeader headerObject,
char delimiter, boolean showHeaderText, boolean showFooterText) {

	try {
		this.fis = new FileInputStream(file);
		**this.reader = OutputUtils.getCSVReader(fis, delimiter);**
		this.noOfSheets = noOfSheets;
		this.maxRows = maxRows;
		this.reportProcessHeader = headerObject;
		this.showHeaderText = showHeaderText;
		this.showFooterText = showFooterText;
		this.fileFormat = FilenameUtils.getExtension(file.getName()).equals("xls") ? FileFormatType.EXCEL_97_TO_2003
				: FileFormatType.XLSX;
		this.columnHeader = OutputUtils.getColumnHeaders(headerObject.getColumnProperties());
		this.noOfCols = this.columnHeader.length - 1;
		this.columnIndexes = OutputUtils.getColumnIndexesPlannedForDownload(headerObject.getColumnProperties(),
				this.columnHeader);
	} catch (Exception e) {
		e.printStackTrace();
	}

}

public int nextCell() {
	if (colIndex < noOfCols) {
		colIndex++;
		return colIndex;
	}
	return -1;
}

public int nextRow() {
	colIndex = -1;
	String[] line;

	++rowIndex;
	try {
		if (rowIndex == 0) {
			if (temp > 0)
				this.sheetLastLine = this.values;
			// Getting the header.
			values = columnHeader;// columnHeaderStyles.keySet().toArray();
			this.headerRow = true;
			return rowIndex;
		}
		if (showHeaderText) {
			if (rowIndex == 1) {
				// Getting the header.
				values = columnHeader;// columnHeaderStyles.keySet().toArray();
				return rowIndex;
			} else if (rowIndex == 2 && temp > 0) {
				this.values = this.sheetLastLine;
				this.headerRow = false;
				return rowIndex;
			}
		}
		if (rowIndex == 1 && temp > 0) {
			this.values = this.sheetLastLine;
			this.headerRow = false;
			return rowIndex;
		}
		if (showFooterText && rowIndex > (fileFormat == FileFormatType.XLSX ? 1048576 - 3 : 65536 - 3)) {
			return rowIndex;
		}
		**if ((line = reader.readNext()) != null) {**
			line = OutputUtils.getColumnValuesPlannedForDownload(line, this.columnIndexes);
			values = line;
			this.headerRow = false;
			temp++;
			return rowIndex;
		} else {
			try {
				if (fis != null)
					fis.close();
			} catch (IOException ex) {
				ex.printStackTrace();
			}
		}
	} catch (Exception e) {
		e.printStackTrace();
	}
	return -1;
}

}

I will try your suggestions and update.

John.He · December 23, 2024, 6:07am

@koteswaragunda
Thank you for your feedback. Please take your time to try the suggested solutions. Hopefully, your issue will be sorted out. Please let us know your feedback.

koteswaragunda · December 23, 2024, 7:07am

Hi, In my earlier post, i have provided information on the approach how we are reading the data from csv in LightCellsDataHelper, can you please check once and suggest if that approach is fine or any changes needed.

johnson.shi · December 23, 2024, 10:09am

@koteswaragunda

It seems your LightCellsDataHelper is the implementation of LightCellsDataProvider, and it looks just fine to work for saving the excel file with LightCells. If you get issue with it during the process, please provide us the complete code and example data so we can check it further for you.