OutOfMemoryError When Converting to PDF a Spreadsheet with Long Distance Between Cells

l.alex · November 19, 2024, 8:11pm

Hello Community,

I am encountering an issue where converting an Excel spreadsheet to PDF causes the JVM to exhaust memory and crash. Unfortunately, this error cannot be handled programmatically, and I need a solution to prevent it from occurring.

The Spreadsheet:
The spreadsheet contains only three populated cells:

A1 with the value “1”.
The last cell of the sheet, XFD1048576, also with the value “1”.
A SUM formula in B2 that adds the values from the previous two cells.

What I’ve Tried:
I am aware of the LightCells API provided by Aspose.Cells for handling large datasets more efficiently in terms of memory. I currently use:

loadOptions.setLightCellsDataHandler() for efficient workbook loading.
ooxmlSaveOptions.setLightCellsDataProvider() for lightweight saving in Excel formats.
However, it seems that these optimizations are not available for PDF conversion using PdfSaveOptions. Specifically, I cannot do pdfSaveOptions.setLightCellsDataProvider like I can with OoxmlSaveOptions.

Note:
I am already using MemorySetting.MEMORY_PREFERENCE during the conversion process to optimize memory usage.

Issue:
The OutOfMemoryError consistently occurs at the workbook.save(outputStream, pdfSaveOptions) call.

My Code:
Below is a snippet of my current implementation:

com.aspose.cells.LoadOptions loadOptions = new com.aspose.cells.LoadOptions();
loadOptions.setLightCellsDataHandler(new LightCellsDataHandlerVisitCells());
loadOptions.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
loadOptions.setInterruptMonitor(interruptMonitor);

Workbook workbook = null;
try {
    workbook = new Workbook(inputStream, loadOptions);
    workbook.setInterruptMonitor(interruptMonitor);
} catch (CellsException cellsException) {
    // Handling the exception options
}

workbook.calculateFormula();

com.aspose.cells.PdfSaveOptions pdfSaveOptions = new com.aspose.cells.PdfSaveOptions();
pdfSaveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
pdfSaveOptions.setImageResample(30, 30);

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
    // Save the workbook to PDF format using the outputStream
    workbook.save(outputStream, pdfSaveOptions); // OutOfMemoryError occurs here.
} catch (CellsException cellsException) {
    // Handling the exception options
}

Question:
How can I prevent this error from occurring during PDF conversion, especially for spreadsheets with formulas referencing distant cells like this?

Thank you for your time and assistance!

Professionalize.Discourse · November 19, 2024, 8:11pm

@l.alex

To address the OutOfMemoryError you are encountering when converting a spreadsheet to PDF, especially with a large distance between populated cells, you can implement several strategies to optimize memory usage during the conversion process.

Memory Settings: You are already using MemorySetting.MEMORY_PREFERENCE, which is a good start. This setting helps optimize memory consumption while loading and processing spreadsheets. Ensure that this is set correctly in your LoadOptions.
Use of LightCells API: While you mentioned that the LightCells API is not directly applicable for PDF conversion, it is still beneficial for loading large datasets efficiently. Make sure you are utilizing the LightCellsDataHandler properly when loading the workbook, as you have done.
Optimize PDF Save Options: Although you cannot set a light cells data provider for PDF conversion, you can still adjust other options in PdfSaveOptions to reduce memory usage. For example, consider lowering the image resampling settings or adjusting the compliance level if it is not strictly necessary.
Chunk Processing: If the spreadsheet is particularly large or complex, consider processing it in smaller chunks. This might involve breaking down the workbook into smaller parts, converting them individually, and then merging the resulting PDFs if necessary.
Increase JVM Memory: If possible, increase the maximum heap size allocated to the JVM. This can be done by setting the -Xmx parameter when starting your Java application. For example, -Xmx2048m would allocate 2GB of memory.
Monitor Memory Usage: Implement monitoring to track memory usage during the conversion process. This can help identify specific points where memory spikes occur, allowing you to optimize those areas.

Here’s a revised version of your code snippet with a focus on memory optimization:

com.aspose.cells.LoadOptions loadOptions = new com.aspose.cells.LoadOptions();
loadOptions.setLightCellsDataHandler(new LightCellsDataHandlerVisitCells());
loadOptions.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
loadOptions.setInterruptMonitor(interruptMonitor);

Workbook workbook = null;
try {
    workbook = new Workbook(inputStream, loadOptions);
    workbook.setInterruptMonitor(interruptMonitor);
} catch (CellsException cellsException) {
    // Handle exception
}

workbook.calculateFormula();

com.aspose.cells.PdfSaveOptions pdfSaveOptions = new com.aspose.cells.PdfSaveOptions();
pdfSaveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
pdfSaveOptions.setImageResample(30, 30); // Adjust as necessary

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
    workbook.save(outputStream, pdfSaveOptions); // Monitor memory usage here
} catch (CellsException cellsException) {
    // Handle exception
}

By implementing these strategies, you should be able to mitigate the OutOfMemoryError during the PDF conversion process. If the problem persists, consider reaching out to Aspose support for further assistance tailored to your specific use case.

l.alex · November 19, 2024, 8:16pm

l.alex:

com.aspose.cells.LoadOptions loadOptions = new com.aspose.cells.LoadOptions();
loadOptions.setLightCellsDataHandler(new LightCellsDataHandlerVisitCells());
loadOptions.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
loadOptions.setInterruptMonitor(interruptMonitor);

Workbook workbook = null;
try {
    workbook = new Workbook(inputStream, loadOptions);
    workbook.setInterruptMonitor(interruptMonitor);
} catch (CellsException cellsException) {
    // Handling the exception options
}

workbook.calculateFormula();

com.aspose.cells.PdfSaveOptions pdfSaveOptions = new com.aspose.cells.PdfSaveOptions();
pdfSaveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
pdfSaveOptions.setImageResample(30, 30);

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
    // Save the workbook to PDF format using the outputStream
    workbook.save(outputStream, pdfSaveOptions); // OutOfMemoryError occurs here.
} catch (CellsException cellsException) {
    // Handling the exception options
}

I answer the above automated response just for reference: I already use the pdfSaveOptions.setImageResample, however in this scenario there are no images in the workbook.

Also, the suggested code is identical to mine.

Thanks.

peyton.xu · November 20, 2024, 2:15am

@l.alex ,

The core issue is that the file has too many pages. If you printview the source file in Excel, there will be about 40 million pages, and most of the pages are blank.
You can use the following code to ignore these blank pages.

Workbook wb ...

com.aspose.cells.PdfSaveOptions pdfSaveOptions = new com.aspose.cells.PdfSaveOptions();
pdfSaveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
pdfSaveOptions.setPrintingPageType(PrintingPageType.IGNORE_BLANK);

wb.save ...

Also, you may limit the max page count of the generated pdf file. e.g. 10K pages.

Workbook wb ...

com.aspose.cells.PdfSaveOptions pdfSaveOptions = new com.aspose.cells.PdfSaveOptions();
pdfSaveOptions.setCompliance(PdfCompliance.PDF_A_1_B);
//limit page count to 10K
pdfSaveOptions.setPageCount(10000);

wb.save ...

l.alex · November 20, 2024, 9:22pm

Thank you, this was actually the case, lots of blank pages, and I was actually looking into restricting blank pages with PdfSaveOptions.setOutputBlankPageWhenNothingToPrint(false) . Would anyone know what is the difference between this approach and you suggested code pdfSaveOptions.setPrintingPageType(PrintingPageType.IGNORE_BLANK);?

peyton.xu · November 21, 2024, 1:11am

@l.alex ,

PdfSaveOptions.setOutputBlankPageWhenNothingToPrint is used for the case that there is no content in the source, the page count is zero. if PdfSaveOptions.OutputBlankPageWhenNothingToPrint is set to true, it will generate a pdf file with only one blank page. If PdfSaveOptions.OutputBlankPageWhenNothingToPrint is set to false, it will throw exception indicating that “There is nothing to output/print.” in this case.

pdfSaveOptions.setPrintingPageType(PrintingPageType.IGNORE_BLANK); is used for ignoring blank pages.

l.alex · November 25, 2024, 5:14pm

Thanks, very well explained!

amjad.sahi · November 25, 2024, 5:25pm

@l.alex,

You’re welcome. Please don’t hesitate to reach out if you have any additional questions or feedback.

l.alex · November 26, 2024, 5:25pm

@amjad.sahi, I greatly appreciate your support and I would appreciate it if you could have a look at my new post about triggering pdf conversion exceptions

amjad.sahi · November 26, 2024, 6:36pm

@l.alex,

Please see and follow up your other thread for replies and for your reference.