Hello,
I have been investigating slow performance when re-saving an existing PDF with Aspose.PDF for .NET 20.7 and have found strange behavior.
I have a PDF with 3 pages. If I split each page to a separate file (using Acrobat), loading and re-saving them (with no changes) with Aspose.PDF takes the following time for each page:
Page 1: 1 second
Page 2: 43 seconds
Page 3: 41 seconds.
Total: 85 seconds.
However, if I load and re-save the same 3 pages that have been combined in a single PDF, it takes 324 seconds to complete - about 4 additional minutes.
This seems to be exponential. A 30 page PDF is taking over an hour to process, but if split to individual pages each page processes much faster.
Sample code & files as follows:
public static void SavePDFs()
{
// Re-save each page individually
SavePdf("page 1.pdf", "page 1 out.pdf");
SavePdf("page 2.pdf", "page 2 out.pdf");
SavePdf("page 3.pdf", "page 3 out.pdf");
// Now re-save the pages in combined PDF.
SavePdf("pages 1-3.pdf", "pages 1-3 out.pdf");
}
public static void SavePdf(string inputName, string outputName)
{
var timer = new System.Diagnostics.Stopwatch();
timer.Start();
var pdf = new Aspose.Pdf.Document(inputName);
pdf.Save(outputName);
Console.WriteLine(inputName+ " save time: " + timer.ElapsedMilliseconds);
}
Output is:
page 1.pdf save time: 731
page 2.pdf save time: 43012
page 3.pdf save time: 41231
pages 1-3.pdf save time: 324714
Files: Individual pages.zip (6.1 MB) Combined pages.zip (5.7 MB)
Is there some reason that re-saving the combined pages takes much longer than re-saving the same pages individually? Can it please be fixed so the performance for saving the combined pages is similar to saving all the individual pages?
Also, it raises the question why re-saving (with no changes) takes so long? Can the library detect if no changes have been made, and in such case the save time could be reduced (e.g. by just writing the original PDF data)?
Thanks