Combined pages taking much longer to save than individual pages

ast3 · July 28, 2020, 12:24am

Hello,

I have been investigating slow performance when re-saving an existing PDF with Aspose.PDF for .NET 20.7 and have found strange behavior.

I have a PDF with 3 pages. If I split each page to a separate file (using Acrobat), loading and re-saving them (with no changes) with Aspose.PDF takes the following time for each page:

Page 1: 1 second
Page 2: 43 seconds
Page 3: 41 seconds.

Total: 85 seconds.

However, if I load and re-save the same 3 pages that have been combined in a single PDF, it takes 324 seconds to complete - about 4 additional minutes.

This seems to be exponential. A 30 page PDF is taking over an hour to process, but if split to individual pages each page processes much faster.

Sample code & files as follows:

public static void SavePDFs()
{
    // Re-save each page individually
    SavePdf("page 1.pdf", "page 1 out.pdf");
    SavePdf("page 2.pdf", "page 2 out.pdf");
    SavePdf("page 3.pdf", "page 3 out.pdf");

    // Now re-save the pages in combined PDF.
    SavePdf("pages 1-3.pdf", "pages 1-3 out.pdf");
}

public static void SavePdf(string inputName, string outputName)
{
    var timer = new System.Diagnostics.Stopwatch();
    timer.Start();

    var pdf = new Aspose.Pdf.Document(inputName);
    pdf.Save(outputName);

    Console.WriteLine(inputName+ " save time: " + timer.ElapsedMilliseconds);
}

Output is:

page 1.pdf save time: 731
page 2.pdf save time: 43012
page 3.pdf save time: 41231
pages 1-3.pdf save time: 324714

Files: Individual pages.zip (6.1 MB) Combined pages.zip (5.7 MB)

Is there some reason that re-saving the combined pages takes much longer than re-saving the same pages individually? Can it please be fixed so the performance for saving the combined pages is similar to saving all the individual pages?

Also, it raises the question why re-saving (with no changes) takes so long? Can the library detect if no changes have been made, and in such case the save time could be reduced (e.g. by just writing the original PDF data)?

Thanks

asad.ali · July 28, 2020, 4:19pm

@ast3

We were able to notice similar behavior of the API in our environment. Therefore, we have generated an issue under the ticket ID PDFNET-48592 in our issue management system. We will investigate this case in details and keep you posted with the status of ticket resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.