Splitting & merging a PDF causes a massive increase in file size

Just wanted to let you all know about an Aspose.PDF for .NET issue we encountered.

Reproduction steps:

  1. Load a 1,000 page PDF.
  2. Split it into 1,000 one page PDFs. Save these files on your disk or in blob storage.
  3. Load the 1,000 one page PDFs and merge them into one 1,000 page PDF.
  4. (Optional) Call document.Optimize(). (This reduces the file size by a trivial amount.)
  5. Save the resulting 1,000 page PDF.

The file size of the final PDF will be WAY bigger than the original PDF even though they contain the same content.

To give you a rough idea of the size increase, I performed these steps on the first 2,000 pages of a 10,000 page PDF. The original PDF (10,000 pages) was 37 MB. The merged PDF (2,000 pages) was 1,250 MB. That’s a 33x increase and I’m not even taking into account that the merged PDF contains only 20% of the pages.

Let me know if a minimal reproduction would be helpful to you.

BTW, putting one of the giant merged PDFs through the iLovePDF optimizer reduces the file size by a huge amount.

@sam.magura

I request you to share your merged file with 1000 pages so that we can check the contents and the file sizes for our investigations.

Sure. Here is an example using a PDF that I can share publicly. The file size increase was only 7x in this case — it depends on the PDF used.

Original file, 2.4 MB
Merged file, 17.2 MB

@sam.magura

A ticket with ID PDFNET-50855 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.