importCSV() throws outofmemoryerror for large CSV files

Hi,


We are using aspose to merge CSV file to XLS file in java code.
When the size of the CSV is large it is throwing OutOfMemoryError in aspose code.

This is the code snippet I tried.

try
{
Workbook wb=new Workbook(“rams3base.xlsx”);
Cells cells =wb.getWorksheets().get(“sheet1”).getCells();
final TxtLoadOptions options = new TxtLoadOptions();

cells.importCSV(“3327d2fc0a07009a3cf130a6f063ab28_Sheet1.csv”, options, 0, 1);
wb.save(“rams3base.xlsx”);
}
catch(Throwable e)
{
e.printStackTrace();
}




Not able to attach the CSV file that is causing the issue.
But the size of the CSV is more than 600 MB.
It is throwing the following exception from aspose.

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at java.lang.StringBuilder.toString(Unknown Source)
at com.aspose.cells.b.a.d.n.f(Unknown Source)
at com.aspose.cells.bec.a(Unknown Source)
at com.aspose.cells.bec.b(Unknown Source)
at com.aspose.cells.bec.a(Unknown Source)
at com.aspose.cells.bec.a(Unknown Source)
at com.aspose.cells.Cells.importCSV(Unknown Source)
at com.rams.Test.main(Test.java:170)

Please suggest if there is any other way of merging large CSV s into Excel with out consuming lot of memory.

This is very critical for us.
Please suggest how to use importCSV with large CSV files.


Hi Chandra,


Thank you for contacting Aspose support.

Please try the scenario by enabling the Memory Preferences option provided by Aspose.Cells APIs. Moreover, please also give a try to the latest release of the component, that is; Aspose.Cells for Java 8.7.2.5. In case the problem persists, please share the CSV file by uploading it to some free file hosting service or your own server, as we have to perform tests in order to investigate the matter thoroughly. We will also require your environment details as follow.

  • Operating system version & architecture
  • JDK vendor, version & architecture
  • JVM arguments, if any
  • Input spreadsheet, referenced as rams3base.xlsx in your code

Thanks for the quick response.

I tried the code below as you mentioned with latest jar aspose-cells-8.4.1.jar and still getting out of memory error with the same stack trace mentioned in the beginning of this thread.
Below is the code I tried.



try
{
Workbook wb=new Workbook("rams20base.xlsx");
wb.getSettings().setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
Cells cells =wb.getWorksheets().get("sheet1").getCells();
final TxtLoadOptions options = new TxtLoadOptions();
options.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);

cells.importCSV("3327d2fc0a07009a3cf130a6f063ab28_Sheet1.csv", options, 0, 1);
wb.save("rams3base.xlsx");
}
catch(Throwable e)
{
e.printStackTrace();
}

I have shared the link for the CSV file I am trying here below.
You can try importing this CSV to any new XLSX document. You should see the issue.


https://drive.google.com/open?id=0B0EtDSlLaxD3QjY0akVBX29mU2M


Hi Chandra,


Thank you for sharing the sample CSV.

Please note, the CSV contains 889560 rows therefore the process requires a lot of memory to load such a huge amount of data. I am able to successfully import your provided CSV using the following piece of code, however, I had to change the max heap size to 2GB (-Xms512m -Xmx2g) in order to process the file without observing OutOfMemoryError.

Note: I have tested the scenario while using your current version (8.4.1) and the latest (8.8.0) on Windows 10 x64 against JDK 1.8.0_66.

Java

Workbook wb = new Workbook(dir + “book1.xlsx”);
Cells cells = wb.getWorksheets().get(“sheet1”).getCells();
cells.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
final TxtLoadOptions options = new TxtLoadOptions();
options.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
cells.importCSV(dir + “3327d2fc0a07009a3cf130a6f063ab28_Sheet1.csv”, options, 0, 1);
wb.save(dir + “output.xlsx”);

Hi,


It is working fine with 2GB memory.

But in our application , we are using this importCSV() API in a program which will be run on client side. We are using JNLP and applet for achieving this on client machine. We tried using the below memory setting in JNLP file to force 2GB of memory allocation for JVM in client machine.

Steelwedge E3 - Enterprise Enabled Excel
Steelwedge Software Inc.
<j2se version=“1.8+” max-heap-size=2048m" />

But this is causing issues in client machine . The machine are hanging once 2GB of memory allocated for this because most of them has 4GB laptops where they will run our application.

We may not be able to use this as a solution for our problem.
It may not a good idea to force clients to have some hardware requirement to run the web application.

Can you please suggest if there is any other solution to use importCSV to merge 600MB CSV with XL file. Any other way is there in aspose to copy the data from CSV to Excel file without loading entire CSV into memory.

Is there any way to merge this in batches?
Is it possible to get an API for merging the CSV to sheet1 of Excel in batches ?
Is there a way to stream data from CSV and append it to existing workheet? This way I can iterate using other API and append the data using aspose.

I request you to please prioritize this because this is now very critical in our application.

Hi Chandra,


Thank you for writing back. It is good to know that you are able to resolve the problem by increasing the Java heap size to 2GB. Regarding the concerns about process being hung, it is common in Java applications that when there is not enough memory, the code execution will become very slow. The solution is to add more memory to the Java heap size, just more than required by the process. As your client’s machines have just 4GB of RAM so you can try the scenario by setting the -Xmx to 3000m.

Regarding your concerns about splitting the CSV files, I am afraid, it will not help if you wish to import all data to one Workbook object. I have already tried this scenario by splitting the CSV to several smaller chunks and imported the data chucks on the cells while keeping the Java heap size to 1GB. In this case I have noticed OutOfMemoryError. However, if your application requirement could accommodate to have several Workbooks to hold the complete CSV data then you do not need much memory because each chunk of the CSV data will be processed by a separate Workbook object and will be disposed as soon as the data has been imported and spreadsheet has been saved to file. Please note, it is convenient to have smaller resultant spreadsheets because Excel application requires more time and memory to load larger spreadsheets. If you import all 889560 rows to one spreadsheet, the resultant file will take some time to open in Excel because its size would be around 65MB. However, if you split the CSV to 100 MB chunks, the resultant spreadsheets would be of 10MB in size and therefore loaded efficiently.

Here is the code to split the CSV to chunks of 100MB and import data to separate Workbook instances.

Java

FileReader fileReader = new FileReader(dir + “3327d2fc0a07009a3cf130a6f063ab28_Sheet1.csv”);
BufferedReader bufferedReader = new BufferedReader(fileReader);
String line="";
int fileSize = 0;
BufferedWriter fos = new BufferedWriter(new FileWriter(dir + “chunks/” + new Date().getTime() + “.csv”, true));
while((line = bufferedReader.readLine()) != null) {
if(fileSize + line.getBytes().length > 100 * 1024 * 1024){
fos.flush();
fos.close();
fos = new BufferedWriter(new FileWriter(dir + “chunks/” +new Date().getTime()+".csv", true));
fos.write(line+"\n");
fileSize = line.getBytes().length;
}else{
fos.write(line+"\n");
fileSize += line.getBytes().length;
}
}
fos.flush();
fos.close();
bufferedReader.close();

File directory = new File(dir + “chunks/”);
File[] fList = directory.listFiles();
for (File file : fList)
{
Workbook wb = new Workbook(FileFormatType.XLSX);
Cells cells = wb.getWorksheets().get(“sheet1”).getCells();
cells.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
final TxtLoadOptions options = new TxtLoadOptions();
options.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
System.out.println(file.getPath());
cells.importCSV(file.getPath(), options, 0, 1);
wb.save(file.getPath() + “.xlsx”);
wb.dispose();
}