Very large PDF merge

Hi.

I am evaluating a few PDF libraries to see if they can handle a number of out requirements.

The first requirement is to concatenate a large number of PDFs into a single PDF, perhaps 200,000 PDF documents with a million+ pages in total with the final PDF being 30GB+ (this is for a print run).

I tried your pdfEditor.Concatenate() from the sample code pages but had to stop the test when the memory exceeded 20GB.

Do you have an option for creating the PDF to a file stream so everything isn’t in memory? I don’t mean create the PDF and then write to a file stream, I can see that can be done, I mean write to a file stream as the document is being generated so memory usage is kept minimal.

For comparison, one of the other libraries we are evaluating (iText) created a 36GB file while keeping memory use to about 1 GB for 1.2 million pages so I just need to know whether your library will be able to achieve similar performance before I investigate further.

If so then can you point me to some example code which uses a file stream for output and I will request an evaluation licence so I can try.

Thanks in advance, Dev.

@devenex

Thanks for evaluating our API and sharing your feedback.

Aspose.PDF for .NET implements an approach where document is loaded in main memory of the system and in order to improve memory consumption, we have introduced DOM model where all processing is managed at document object level. You may please also try to use this approach in order to merge PDF files.

Regarding your requirements to create PDF to a file stream, we need to investigate it further and once its feasibility is proved, we will surely implement it. However, please share your environment details with us like memory installed in your system, your application type, .NET Framework Version, OS Name and Version. We will further proceed to assist you accordingly.

It looks like a good support @asad.ali. But I would like to know more about it. Anyways, can I edit PDF using this merge pdf files?

I had a look at the alternative merge method but it still creates the destination pdf in memory so is not something we can implement.

My dev environment is Win10 1909 x64, 16GB, .NET 4.8 and the prod environment will be similar but using Windows Server 2019.

@devenex

Thanks for your feedback.

We would like to share with you that Aspose.PDF for .NET also offers incremental saving of the documents in order to minimize the memory consumption. Please check following sample of code snippet and try it in your environment. In case you still face memory related issues, please let us know. We will further proceed to assist you accordingly.

FileStream fs = new FileStream("source.pdf", FileMode.Open, FileAccess.ReadWrite);
Document doc = new Document(fs);
// do stuff
doc.Save();