We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Benchmark durations to extract content from Excel files in java using Aspose Cells

Hi Team, I would like to know if there are any benchmarks made for the time taken to extract the content of excel files of various sizes in Java using Aspose cells. I tried extracting a ~50MB excel file and it took approximately 13 seconds (the file contains 500K rows and 14 columns).

And it took approximately 3 seconds to extract a 2MB file (which contains 20K rows and 13 columns).

Is this inline with the processing capacity of the Aspose cells? Or would you recommend any improvisations in the code to extract the content.

Please let me know.

Thanks
Ravi

code snippet.7z (755 Bytes)
Attaching the code snippet for reference.

@teju90,

I checked and your code looks Ok.

Since you are extracting a huge data set from the sheet(s), so it will take little time for sure.
May be you could save each sheet in the workbook to separate CSV or text format, it might be little quicker.

@teju90,

If you only need to export cell content without other objects in the Workbook, you may try LightCells and LoadFilter to improve the performance a bit(those apis mainly help for memory performance, generally better memory performance may decrease the time cost a bit too). For using LightCells, you may take this page as reference. And here is an example for you which simulates the logic in your provided code:

LoadOptions opts1 = new LoadOptions();
opts1.getLoadFilter().setLoadDataFilterOptions(LoadDataFilterOptions.CELL_DATA);
opts1.setLightCellsDataHandler(new ExportContent());
Workbook wb = new Workbook("CellsJava41811.xlsm", opts1);

    private static class ExportContent implements LightCellsDataHandler
    {
        private static final int limit = 0x01<<25;
        private final StringBuilder sb = new StringBuilder(limit);
        private int count = 0;
        public boolean startSheet(Worksheet sheet)
        {
            System.out.println("Before processing sheet[" + sheet.getIndex() + "], content length = " + count);
            String s = sheet.getName();
            count += s.length() + 1;
            if(count > limit)
            {
                return false;
            }
            sb.append(s);
            sb.append(',');
            return true;
        }
        public boolean startRow(int rowIndex) {return count < limit;}
        public boolean processRow(Row row) {return count < limit;}
        public boolean startCell(int columnIndex) {return count < limit;}
        public boolean processCell(Cell cell)
        {

            String s = cell.getStringValue(CellValueFormatStrategy.CELL_STYLE);
            count += s.length() + 1;
            if(count > limit)
            {
                return false;
            }
            sb.append(s);
            sb.append(',');
            return false;
        }
    }
1 Like

Thanks for the response @johnson.shi. Is this the streaming way of loading the data? Would this load the images (and any other non-text content) in the excel as well while reading? I am primarily concentrated on the cell content itself, but I am just curious to know how the memory footprint would be reduced by using this API.

Appreciate your help.

Thanks
Ravi

@teju90,
LightCells and LoadFilter are used to improve the performance of processing large files. Through inheritance, you can control which worksheets need to import data, and even accurately control the import of rows and cells.

1 Like

Got it. Thanks @John.He

@teju90,
You are welcome. If you have any questions, please feel free to contact us.

@teju90,
With the example code we provided, only the cells data will be loaded, without any other objects such as images, shapes, …etc.

The difference of memory cost of this way and the normal way(load the workbook completely) varies from file to file.

Firstly, if there are lots of images in the template file, then the memory which should be used by data of those images will be saved.

Secondly, in the implementation of LightCellsDataHandler.processCell(Cell), we return false which means that the cell’s data will not be kept in cells mode, so no memory will be occupied after it is processed in this method. In such way if there are large amount of cells, you will get much better memory performance.

Anyways, we think you may test and compare two ways according to your requirement and specific template file to get the actual result about performance and then choose the proper one for your business.

1 Like

Thanks for more insights @johnson.shi

@teju90,
You are welcome.