CSV to PDF conversion

I am evaluating aspose for converting csv files to pdf(large csv files with rows over a million). I created a workbook object with file path and then saved/converted it to pdf file.
The problems I have are :

  1. The table that gets generated does not have outline for cells/columns. How can this be done?
  2. Also some columns are getting trimmed off, how can you specify column width?

Any help is greatly appreciated as we are in a rush to finalize a package.

Thanks
bp

@bpaul

Thanks for using Aspose APIs.

You will use the following properties for your needs. Please see the following sample code and its input Csv and output Pdf files for your reference.

  • PageSetup,PrintGridlines
  • Worksheet.AutoFitColumns()

Download Link:
CSV and Output Pdf.zip (13.0 KB)

C#

LoadOptions opts = new LoadOptions(LoadFormat.CSV);

Workbook wb = new Workbook("Book1.csv", opts);

Worksheet ws = wb.Worksheets[0];

//We want to print grid lines
ws.PageSetup.PrintGridlines = true;

//Autofit the columns
ws.AutoFitColumns();

PdfSaveOptions pdfopts = new PdfSaveOptions();
pdfopts.GridlineType = GridlineType.Hair;

wb.Save("output.pdf", pdfopts);

Hi Shakeel,

Thanks for the response. I rewrote your code in java as

LoadOptions opts = new LoadOptions(LoadFormat.CSV);
Workbook wb = new Workbook(srcPath+"/"+srcFileName,opts);
Worksheet ws = wb.getWorksheets().get(0);

	ws.getPageSetup().setPrintGridlines(true);
	ws.autoFitColumns();

	PdfSaveOptions pdfopts = new PdfSaveOptions();
	pdfopts.setGridlineType(GridlineType.HAIR);
	
	wb.save(destPath+"/"+destFileName,pdfopts);

Now, what I observed is that the pdf generated will only have a maximum of 20165 pages (147,428 KB size) and it is trimming off the rest. Is this some thing that can be fixed, my csv file can have upto 5 million rows and currently the pdf kind of shows only over a million rows.

To add on, can this be done using the aspose cloud api’s? Also, can this be done in node js ?

Thanks
bp

@bpaul

Because a sheet max rows are 1048576, please divide your large CSV file to several files (e.g. 5 files), then use the following code.

Java

Workbook wb = new Workbook();
WorksheetCollection sheets = wb.getWorksheets();
TxtLoadOptions loadOpt = new TxtLoadOptions(LoadFormat.CSV);

//e.g. divide the whole csv file to 5 files
int csvFileCount = 5;

for (int i = 0; i < csvFileCount; i++)
{
	Worksheet sheet = sheets.get(i);
	//e.g. the divided csv files: csvFile0, csvFile1, csvFile2, csvFile3, csvFile4
	sheet.getCells().importCSV("csvFile" + i, loadOpt, 0, 0);

	sheet.getPageSetup().setPrintGridlines(true);
	sheet.autoFitColumns();
		
	sheets.add();
}

PdfSaveOptions pdfopts = new PdfSaveOptions();
pdfopts.setGridlineType(GridlineType.HAIR);

wb.save(outFile.pdf,pdfopts);

Yes, you can import CSV into Excel file and convert Excel file to Pdf in Cloud as well. Since, Cloud is independent platform, Node JS will also work with it.

Hi,
For a 47 mb csv file my java process is running out of memory (3g set). Is this normal ? Is there some way to save on memory?

This is my final code ; but it still has memory issues. If the csv has more than a million rows then I split it and the else part gets executed. One thing i noticed is that once the pdf is processed the memory is not going down. Is there some stream that I need to close ?

Workbook wb = null;
String srcFileName = fileName.substring(0,fileName.lastIndexOf("."));
if (csvFileCount > 0) {
wb = new Workbook();
wb.getSettings().setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
WorksheetCollection sheets = wb.getWorksheets();
TxtLoadOptions loadOpt = new TxtLoadOptions(LoadFormat.CSV);
loadOpt.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
for (int i = 1; i <= csvFileCount; i++) {

  		Worksheet sheet = sheets.get(i-1);
  		sheet.getCells().setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
  		sheet.getCells().importCSV(localDir + "/" + srcFileName + i+".csv", loadOpt, 0, 0);

  		sheet.getPageSetup().setPrintGridlines(true);
  		sheet.autoFitColumns();

  		sheets.add();
  	}
  }else {
  	LoadOptions opts = new LoadOptions(LoadFormat.CSV);
  	opts.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
  	wb = new Workbook(localDir+"/"+fileName,opts);

  	wb.getSettings().setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
  	
  	Worksheet ws = wb.getWorksheets().get(0);			
  	ws.getPageSetup().setPrintGridlines(true);
  	ws.autoFitColumns();
  	
  }
  PdfSaveOptions pdfopts = new PdfSaveOptions();
  pdfopts.setGridlineType(GridlineType.HAIR);
  

  wb.save(localDir+"/"+srcFileName+".pdf",pdfopts);

Thanks in advance
bp

@bpaul

Thanks for using Aspose APIs.

Please share with us your 47 MB CSV file. If the file is split to several files, then share with us the split files.

You can upload your files to DropBox or Google Drive etc. and then share the download links here.

Hi Shakeel,

Thanks for the response. I was able to resolve the memory issue in the meantime; it was more to do with the docker container not allocating adequate memory to the java process. I was able to process files of size 200 MB. The only question I have now is once the memory utilization goes up, it is not getting reclaimed. Is there anything more that needs to be done from a code perspective to reclaim the memory ?

Thanks
bp

@bpaul

Please investigate if you are facing Memory Leakage problem in Aspose.Cells. Let us know how to reproduce it so that we could fix it.

But if you find that there is no memory leakage due to Aspose.Cells, then you should find out how to reclaim your memory from garbage collector.

You can ask such a question from stackoverflow or other such websites. Please also check this question relating to garbage collection and reclaiming the memory.

Hi,

When the csv has columns whose (data)length is large, then these columns are moved to separate sheets.
For example if there are columns column1,column2,column3,column4,column5; if column4 is very long then the first sheet will only show column1,column2,column3 and then column4 and column5 appears on a separate page. Can this be accompanied in one page ?

@bpaul

Thanks for using Aspose APIs.

Please provide us your sample code and sample Excel file(s) if there are any replicating this issue. We will look into it and help you asap.

Please also try AllColumnsInOnePagePerSheet or OnePagePerSheet option first. These properties may fix your issue.