Taking lot of time to read the file Continued(Uploaded the source and making it private)

Continuing the discussion from Taking lot of time when reading file and saving to txt as '|' seperated:

        StringBuilder strb =  new StringBuilder();
        com.aspose.cells.LoadOptions opt = new LoadOptions();
        opt.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
        long st = System.currentTimeMillis();
        System.out.println(" Strat to read");
        com.aspose.cells.Workbook book = new Workbook(
                "/user/sample.xlsx",opt);
        System.out.println(" done reading  and it took"+ (System.currentTimeMillis()-st)+" :::");
        long st1 = System.currentTimeMillis();
        book.getWorksheets().forEach(sheet->{
            File f = new File("/user/" + ((Worksheet) sheet).getName() + ".csv");
            OutputStream os;
            f.setWritable(true) ;
            try {
                os = (OutputStream) new FileOutputStream(f);
                String encoding = "UTF8";
                OutputStreamWriter osw = new OutputStreamWriter(os, encoding);
                BufferedWriter bw = new BufferedWriter(osw);
                com.aspose.cells.Cells  clls = ((Worksheet) sheet).getCells();
                for(int row = 1;row<=100000;row++){
                    long st2 = System.currentTimeMillis();
                    System.out.println("Row   :"+row);
                    for(int col = 0;col<=clls.getMaxDataColumn();col++){
                        strb.append((clls.getCell(row, col)).getValue());
                        strb.append("|");
                    }
                    bw.write(strb.toString());
                    bw.newLine();
                    System.out.println("took for Row1  :" +(System.currentTimeMillis()-st2));
                }
                bw.flush();
                bw.close();
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } catch (Exception e) {
                e.printStackTrace();
            }


        });
        long et = System.currentTimeMillis();
        System.out.println("Time Taken ="+(et-st));

I have uploaded the sample source file here.
I have reduced the lines to 250k instead of 500k. Book1.zip (7.5 MB)

I took 3 minutes to read the workbook.
and around 4 minutes to read each line then to process and write to it in a file

@aymarzapak,

Since you need to read huge list of data from the sheet, so we recommend you to try using Light Cells APIs instead which is written for the purpose. In light weight (even driven) mode (comparing with normal mode), data would be read in much less time for sure. See the document for your reference and write your code accordingly. This will improve the performance to certain extent and data would be read quickly and efficiently.

Let us know if you still find any performance issue.

1 Like

@aymarzapak,

By the way, why don’t you directly try to convert your Excel file to text (delimited by “|”) file via Aspose.Cells APIs. I simply tried the following sample code and it works and generated the output file (with all the data from the source Excel file) in around 30 seconds only:
e.g
Sample code:

com.aspose.cells.LoadOptions opt = new LoadOptions();
        opt.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
        long st = System.currentTimeMillis();
        System.out.println(" Strat to read");
        Workbook book = new Workbook("f:\\files\\Book1.xlsx", opt);
        TxtSaveOptions opts = new TxtSaveOptions();
        opts.setSeparator('|');
        book.save("f:\\files\\out1.csv", opts);
        long et = System.currentTimeMillis();
        System.out.println("Time Taken ="+(et-st)); 

PS. Please note, seeing your code segment in the first post, most of the time is taken for reading each line of data and saving to string builder.

1 Like

The thing why I wanted to write it row by row is if in case I have to modifiy any column value then i would need to process it row by row isnt it?
Also I need a help on how to write each sheet as new file, here above, we are writing whole workbook as single file, but in fact it should be each sheet should be a single file with file name as sheet name without the headers of the sheet.

@aymarzapak,

Please note, CSV or text file format are already single sheet based (this is the capacity of the format), so it is rendered based on current active sheet only (you cannot have a csv file with multiple tabs/sheets). To generate different CSV or text files based on each sheet in the workbook, you may hide all sheets except one (in some loop) for the task. Alternatively, you may set active each sheet one by one (in some loop) to be rendered to CSV file.
e.g.
Sample code:

for (int i=0; i<book.getWorksheets().getCount(); i++)
{
book.getWorksheets().setActiveSheetIndex(i);
//......
//your code goes here.
//........
}

Hope, this helps a bit.

Thanks. This would help me a lot.
Any help on avoiding the headers would be appreciated. I couldn’t find any function or Boolean to flag it false. Else it would require me to look into rows and avoid it.

Which headers you are taking about? Please elaborate with a screenshot, we will help accordingly.

headerSC.PNG (4.5 KB)

Added Screenshot of header which I am trying to avoid

@aymarzapak,

You may simply remove the first row using the following line of code and before rendering to CSV:

book.getWorksheets().get(0).getCells().deleteRow(0);

Hope, this helps a bit.

Works like a charm. thank you so much. :heart: aspose

@aymarzapak,

You are welcome.