Hi,
I am facing some issues when converting to PDF from excel / word with documents of larger size.
They are taking over 1 minute of time for the conversion. This usually happens with larger files (5-20mb) however, sometimes with not so large ones as well. Is there anything we can do to lower the conversion times for larger files? This application is deployed using Docker and Kubernetes on a Linux server with 8 vCPUs and 8 GB of memory.
.NET 8.0
Aspose Cells / Aspose Words Versions
Version=“25.11.0”
Aspose Words Code snippet:
try
{
byte[] docBytes = Convert.FromBase64String(inputBase64String); //convert base64 string to byte array
if (inputIsCompressed) //if Compressed, then decompress bytes array
{
Console.WriteLine("Input is compressed. Decompressing...");
docBytes = Utilities.DecompressByteArray(docBytes);
}
string base64Pdf;
int totalPages; //, MswVersion = MsWordVersion.Word2016
var loadOptions = new Aspose.Words.Loading.LoadOptions { WarningCallback = new DocumentLoadingWarningCallback() };
using (var docStream = new MemoryStream(docBytes))
{
var doc = new Document(docStream, loadOptions); //load the document
using (var pdfStream = new MemoryStream())
{
if (doc.HasRevisions)
{
doc.LayoutOptions.RevisionOptions.RevisionBarsColor = Aspose.Words.Layout.RevisionColor.Black;
doc.LayoutOptions.RevisionOptions.ShowRevisionMarks = false;
}
//set custom font folder
Aspose.Words.Fonts.FontSettings fontSettings = new();
fontSettings.SetFontsFolder(fontsFolderPath, true);
doc.FontSettings = fontSettings;
//doc.LayoutOptions.CommentDisplayMode = CommentDisplayMode.Hide; //hide comments
foreach (Aspose.Words.Comment comment in doc.GetChildNodes(NodeType.Comment, true)) //remove all comments from document before converting to PDF
{
comment.Remove();
}
await Task.Run(() => doc.Save(pdfStream, Aspose.Words.SaveFormat.Pdf)); //takes long time to process with larger files
byte[] pdfBytes = pdfStream.ToArray();
//set total pages of attachments
var pdfDocument = new Aspose.Pdf.Document(pdfStream);
totalPages = pdfDocument.Pages.Count;
//isCompressed... then compress output. If using ConvertToPdfTester method, don't compress
if (inputIsCompressed && !inputIsTester)
{
pdfBytes = Utilities.CompressByteArray(pdfBytes);
}
base64Pdf = Convert.ToBase64String(pdfBytes);
}
}
Aspose Cells Code snippet:
try
{
byte[] excelBytes = Convert.FromBase64String(inputBase64String);
if (inputIsCompressed)
{
Console.WriteLine("Input is compressed. Decompressing...");
excelBytes = Utilities.DecompressByteArray(excelBytes);
}
string base64Pdf;
int totalPages;
using (var excelStream = new MemoryStream(excelBytes))
{
var wb = new Workbook(excelStream);
using (var pdfStream = new MemoryStream())
{
var opts = new Aspose.Cells.PdfSaveOptions
{
Compliance = Aspose.Cells.Rendering.PdfCompliance.PdfA1a,
};
await Task.Run(() => wb.Save(pdfStream, opts)); // takes a long time to process with larger files
byte[] pdfBytes = pdfStream.ToArray();
var pdfDocument = new Aspose.Pdf.Document(pdfStream);
totalPages = pdfDocument.Pages.Count;
if (inputIsCompressed && !inputIsTester)
{
Console.WriteLine("Output will be compressed, as input was compressed");
pdfBytes = Utilities.CompressByteArray(pdfBytes);
}
base64Pdf = Convert.ToBase64String(pdfBytes);
}
}
@prodigy234
You’re seeing conversions that take over a minute for larger Excel or Word files and would like to reduce the processing time.
To investigate the performance of Aspose.Words and Aspose.Cells in your Docker/Kubernetes environment, could you provide the following details?
- The exact Docker base image you use (e.g.,
mcr.microsoft.com/dotnet/aspnet:8.0-alpine, debian, etc.) and any resource limits applied to the container (CPU shares, memory caps).
- Whether a licensed version of Aspose is used and how the license file is loaded inside the container.
- For the files that exceed one minute, the approximate page count (Word) or worksheet count (Excel), and any notable content such as large images, embedded objects, or complex formulas.
With this information we can suggest configuration or code‑level adjustments to improve conversion speed.
@prodigy234
1, Please check how many pages should be printed in MS Excel.
2, If there are many empty pages, please ignore them as the following :
PdfSaveOptions options= new PdfSaveOptions();
options.PrintingPageType = PrintingPageType.IgnoreBlank;
// options.PrintingPageType = PrintingPageType.IgnoreStyle;
@simon.zhao
You can skip empty pages when converting an Excel workbook to PDF by setting the PrintingPageType property of PdfSaveOptions.
- PrintingPageType.IgnoreBlank – omits pages that contain no data.
- PrintingPageType.IgnoreStyle – omits pages that are blank or contain only formatting (styles) [2][3][5].
C# example
using Aspose.Cells;
// load the workbook
Workbook wb = new Workbook(inputStream);
// configure PDF options
PdfSaveOptions pdfOpts = new PdfSaveOptions();
pdfOpts.PrintingPageType = PrintingPageType.IgnoreBlank; // or PrintingPageType.IgnoreStyle
// save as PDF
wb.Save(pdfStream, pdfOpts);
@prodigy234,
If you still experience delays with Aspose.Cells when rendering to PDF, we request you to share your template Excel file with us. Please compress the file into a zip format and upload it to a file-sharing service (e.g., Google Drive, Dropbox, etc.). Once uploaded, you can share the download link with us seamlessly here. We will investigate the issue for you.
Additionally, as you have also reported an issue with Word to PDF rendering, I have moved your thread to the Aspose.Total category. This will allow the Aspose.Words team to review your issue and provide the necessary assistance. @alexey.noskov FYI.
@prodigy234
-
In your code you are initializing FontSettings in each conversion call. In this case Aspose.Words need to scan font sources each time. If the font sources are not changed between method call, you can configure FontSettings.DefaultInstance once in the class static constructor and do not initialize a separate instance of FontSettings each time. This can slightly improve the code performance.
-
In your code you save document to PDF and then open PDF document by Aspose.PDF just to get page count. This step can be omitted, you can simply use Document.PageCount property. So you can use this
//set total pages of attachments
totalPages = doc.PageCount;
instead of
//set total pages of attachments
var pdfDocument = new Aspose.Pdf.Document(pdfStream);
totalPages = pdfDocument.Pages.Count;
In general document conversion time depends on many factors such as the document size, it’s complexity, input document format and resources available in your environment. If possible, could you please attach the problematic input document here for testing? We will check conversion on our side and provide you more information.
I am using sample files from this website
Sample XLSX Files Download - Example File
They are anywhere from 5-30mb. I am unable to share them here as the file sizes are too large it is saying.
We have a Kong API Gateway with timeout set to 2 minutes (i have tried with 10mb excel files and they are timing out)
CPU of 8GB
Memory of 8gb
We are using an Aspose Enterprise version. The part of the code taking long is this line
await Task.Run(() => doc.Save(pdfStream, Aspose.Words.SaveFormat.Pdf));
and the same line for Aspose Cells
“The upstream server is timing out” is the response we are receiving from the api call.
But when I investigate its health in our dashboard it seems fine
@prodigy234,
I downloaded an Excel file (“11mb.xlsx”) from the website you mentioned. After testing the provided sample code using Aspose.Cells, I noticed that it does take some time, as you indicated, to save the Excel file as a PDF.
e.g.,
Sample code:
byte[] excelBytes = File.ReadAllBytes("e:\\test2\\11mb.xlsx");
using (var excelStream = new MemoryStream(excelBytes))
{
var wb = new Workbook(excelStream);
using (var pdfStream = new MemoryStream())
{
var opts = new Aspose.Cells.PdfSaveOptions
{
Compliance = Aspose.Cells.Rendering.PdfCompliance.PdfA1a,
};
await Task.Run(() => wb.Save(pdfStream, opts));
byte[] pdfBytes = pdfStream.ToArray();
}
}
I tried using the file path for saving the PDF instead of streams, but it still takes longer to save the file. A thorough evaluation of this issue is necessary. We may consider logging an appropriate ticket to investigate and resolve the matter. Once we have any updates, we will be sure to inform you here.
@prodigy234,
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): CELLSNET-59831 - Conversion to PDF takes more time with large files
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
@prodigy234 ,
Take the “11mb.xlsx” Excel file as example, there are 390001 rows and 8 columns filled with text. Converting such file to pdf will take time. The option PdfCompliance.PdfA1a will output pdf document structures/tags, it will take more time.
You can use SystemTimeInterruptMonitor to cancel the conversion before time is out.
//workbook
var wb = ...
SystemTimeInterruptMonitor monitor = new SystemTimeInterruptMonitor(false);
wb.InterruptMonitor = monitor;
//time limit 50s
monitor.StartMonitor(50000);
//save to pdf