Continuing the discussion from Gigantic file size when saving Aspose.Cells.Workbook as PDF:
Dear support,
as per request of your colleague @amjad.sahi over in the Aspose.Cells department, I’m filing this performance issue with you guys. As stated in the linked topic, I don’t see this specific case as a practical example, your colleague nonetheless wanted this to be filed, so that you can investigate file size and runtime of the pipeline to maybe optimize both of them regardless of the specific data involved.
The processing pipeline (entirely implemented using Aspose.Cells & Aspose.Words) we’re using in this specific example would be
- Convert an Excel document into a Word document by recreating the data table of the Excel worksheet in a Word table. The document “Anonymized_Data2_*.xlsx” would be the input document to this step, which I’ve attached purely for reference.
- Post-processing the resulting Word document (by adding, depending on the exact use case, header, footer, data prior to and after the generated word table) and save the resulting Word document as docx. “result_*.docx” is the output of this step (- which we typically wouldn’t save as docx explicitly, but for your convenience and to compare file sizes, I’m handing you this one as well).
- Typically we’d populate the merge fields, but I’ve skipped that step in the reproduction.
- Save the Word document as a PDF.
For your convenience, I’ve attached the original Excel document (3292 rows) as well as 2 smaller versions with 10 and 500 rows respectively purely for reference (because you won’t be able to see the entire table in Word due to the sheer number of columns) as well as the respective word documents from step 2 of the pipeline. For the smaller versions I’ve simply truncated the number of data rows; but they might be easier to handle…
Also for your reference, I’ve noted the respective file sizes which blow up to 17 fold for the 10 row example and over 60 fold for the complete data sheet (comparing the size of the intermediate Word document and the resulting PDF generated using Aspose):
Filename | File size [KB] | Comments |
---|---|---|
Anonymized_Data2_10.xlsx | 109 | Input to step 1 |
result_10.docx | 30 | Output of step 2 |
result_10_Aspose_Words_Document_Save_asPDF.pdf | 510 | Output of step 4. Time it takes Aspose.Words to wordDocument.Save(…, SaveOptions.PDF) the document: 3 seconds |
result_10_Word_PrintToPDF.pdf | 1.350 | Using MS Word to Print to PDF the file result_10.docx. |
result_10_Word_SaveAs_PDF.pdf | 910 | Using MS Word to Save as → PDF the file result_10.pdf |
Anonymized_Data2_500.xlsx | 413 | |
result_500.docx | 593 | |
result_Aspose_Words_Document_Save_asPDF_500.pdf | 34.535 | 99 seconds |
result_500_Word_PrintToPDF.pdf | 55.738 | |
result_500_Word_SaveAs_PDF.pdf | 65.773 | |
Anonymized_Data2_3292.xlsx | 2.080 | |
result_3292.docx | 3.661 | |
result_Aspose_Words_Document_Save_asPDF_3292.pdf | 223.782 | 692 seconds (which is on the low end over multiple tests) |
result_10_Word_PrintToPDF.pdf | Unknown | |
result_10_Word_SaveAs_PDF.pdf | Unknown |
If you want to investigate the resulting file size and runtime, you can simply the following code. Our tests have been conducted in .NET Framework 4.8 using the latest Aspose.Words version 2025.03.
// This code simply loads the docx document saved after step 2 and performs step 3 of the pipeline:
String path = @"C:\Testdata\";
Aspose.Words.Document wordDocument = new Aspose.Words.Document(path + "result_*.docx");
wordDocument.Save(path + "result_*_Aspose_Words_Document_Save_asPDF.pdf", SaveFormat.Pdf);
Testdata.zip (5.6 MB)
Due to the exorbitant file sizes, I’ve not been able to attach all the mentioned PDF files, but I’m fairly certain, you’ll be able to recreate them by running the above code.
Besides the fact, that (for the complete document) it takes approx 15 minutes to only generate/save the PDF and it’s enormous size of over 220 MB, this process also grabs all the available RAM/memory that it can allocate. On my box, this resulted in over 16 GB being used by the conversion process alone! Putting all the sizes into perspective again:
- original word document: < 4 MB
- resulting PDF: ~ 224 MB
- RAM used: > 16_000 MB
Hopefully the results of your investigation can help improve performance in your Aspose.Words component in regards to runtime, output file size and memory used.
Kind regards.