Hi,
we got an excel document from a customer which needs high memory resources and lots of time (I aborted the process after some minutes) for PDF creation.
I tried it with excel itself it is the same result of much time (I aborterd after several minutes).
The reason for this likely is that the document contains two sheets with more than 1 million rows and colums up to “XET”.
I also noticed excel itself displays a warning " operation you are about to perform affects a large number of cells" if you try to create a PDF from the document.
My question is: Do you got any experiences when it’d be worth to display such a warning in our software, too, or any threshhold / rule of thumb which amounts of cells may be troublesome for such processes like PDF creation?
@Serraniel
If the PDF file generated from the sample file has a large number of blank pages, you can choose to disable the output of blank pages to improve efficiency. Please refer to the following example code.
Workbook wb = new Workbook("sample.xlsx");
PdfSaveOptions options = new PdfSaveOptions();
options.OutputBlankPageWhenNothingToPrint = false;
options.PrintingPageType = PrintingPageType.IgnoreBlank;
wb.Save(filePath + "out.pdf", options );
Would you like to provide your sample file in order to better analyze and locate the issue? We will check it soon.
@Serraniel,
For your information, handling large Excel files with over a million rows and numerous columns can be highly resource-intensive, especially when converting them to PDF. Given your case, it’s definitely a good idea to introduce a warning in your software when users attempt to convert such large datasets. Here are some general guidelines based on performance considerations, so you should implement it in code by yourselves:
- Total Number of Cells:
- If the total number of cells (rows × columns) exceeds 10 million, consider displaying a warning.
- For extremely large sheets (e.g., 50M+ cells), you might even prevent PDF export or suggest alternative ways to reduce data.
- Row Count Thresholds:
- If a single sheet has more than 500,000 rows, a warning is advisable.
- Over 1 million rows is almost guaranteed to be slow and memory-intensive.
- Column Count Thresholds:
- If the number of columns exceeds 256 (IV in Excel 2003, or around “ZC” in modern Excel), performance can degrade significantly.
- Beyond 1,024 columns, PDF rendering may take excessive time.
etc.
Based on the results of above measures, you may allow users to cancel the operation/process or proceed at their own risk or so. Please note, Aspose.Cells also provides the print preview feature which you may use. For this, the API provides WorkbookPrintingPreview and SheetPrintingPreview classes. To create the print preview of the whole workbook, create an instance of the WorkbookPrintingPreview class by passing Workbook and ImageOrPrintOptions objects to the constructor. The WorkbookPrintingPreview class provides an EvaluatedPageCount method which returns the number of pages in the generated preview. Similar to WorkbookPrintingPreview class, the SheetPrintingPreview class is used to generate a print preview for a specific worksheet. See the document on print preview for your reference.
Aspose.Cells also can allow you to stop the conversion of Workbook to various formats like PDF, HTML etc. using the InterruptMonitor object when it is taking too long. The conversion process is often both CPU and Memory intensive and it is often useful to halt it when resources are limited. You can use InterruptMonitor both for stopping conversion as well as to stop loading huge workbook. Please use Workbook.InterruptMonitor property for stopping conversion and LoadOptions.InterruptMonitor property for loading huge workbook. See the document for your reference.
https://docs.aspose.com/cells/net/stop-conversion-or-loading-using-interruptmonitor-when-it-is-taking-too-long/
Hope, this helps a bit.
1 Like
Thanks for the extended feedback. I appreciate those number you mentioned and other suggestions you made and will forward it to our product management. That´s some valuable information to us to detect troublesome documents and avoid situations as long rendering processed in before and also to give feedback to the user in the end, why thinks may be slow.
@Serraniel,
We’re glad to hear that the suggested guidelines and details are useful for your scenario. Please, don’t hesitate to reach out to us if you have any additional questions or feedback.