Merge PDF files into single PDF document using Aspose.PDF for .NET

timg · July 3, 2007, 2:21pm

I have many single-page PDF documents stored in a database as BLOBs (approximately 30000 documents). I would like to retrieve these single page documents and combine them into several large PDFs with about 5000 pages in each document.

What method would you recommend for doing this? Should I use the PdfEditor.Append or Concatenate? I've looked at some of the examples, but I do not have a "source" document to start with...I simply want to combine a few thousand single page documents into one large document.

Thanks for your help.

AdeelTaseer · July 3, 2007, 2:57pm

Hi,

Thank you for considering Aspose.

I think concatenate is better performance-wise. Please refer to:

https://docs.aspose.com/display/pdfnet/Concatenate+PDF+Files

If you need more help, please do let us know.

Thanks.

Adeel Ahmad
Support Developer
Aspose Changsha Team

markshiffer · July 25, 2007, 10:43am

I have a requirement similar to the original poster, but I keep getting out of memory exceptions when the call to concatenate is made.

I am trying to test by concatenating the attached, 1-page pdf 1000 times. I actually have much larger requirements, but I was trying to get a grip on the process and memory usage. Is there a more efficient way to accomplish this with Aspose that will not explode the memory? I figured with the use of streams that Aspose would be able to accomplish this task without loading up the memory with all 1000 pages at once. Any guidance would be appreciated. Thanks.

Code that I am using:

Aspose.Pdf.Kit.License license = new Aspose.Pdf.Kit.License();

license.SetLicense("Aspose.Total.lic");

List<Stream> streams = new List<Stream>();

List<string> tempFiles = new List<string>();

int count = Int32.Parse(repeatTextBox.Text);

for (int i = 0; i < count; i++)

{

tempFiles.Add(i.ToString() + ".pdf");

File.Copy(stitchFileTextBox.Text, i.ToString() + ".pdf", true);

//Create input stream objects holding the PDF files to be concatenated

// DocumentData holds the path to the stitchable stream thanks to InitializeDocuments call

FileStream stitchStream = new FileStream(i.ToString() + ".pdf", FileMode.Open);

streams.Add(stitchStream);

progressLabel.Text = string.Format("Copied file {0} of {1}", i + 1, count);

this.Update();

}

try

{

if (streams.Count > 0)

{

//Create output stream object that would contain the final PDF file

FileStream stitchedStream = new FileStream(@"c:\stitchedDocument.pdf", FileMode.Create);

//Instantiate PdfFileEditor object

PdfFileEditor pdfEditor = new PdfFileEditor();

//Call Concatenate method of PdfFileEditor object to concatenate all input streams

//into a single output stream

pdfEditor.Concatenate(streams.ToArray(), stitchedStream);

//Finally close the output stream

stitchedStream.Close();

}

int closeCount = 1;

foreach (Stream fileStream in streams)

{

fileStream.Close();

progressLabel.Text = string.Format("Closed file {0} of {1}", closeCount, count);

this.Update();

closeCount++;

}

finally

{

// delete temp files

foreach (string file in tempFiles)

{

File.Delete(file);

}

timg · July 25, 2007, 11:15am

Mark,

I had exactly the same problem, which was what prompted me to post the original message. My initial objective was not to make several 5000 page documents, but one 30000 page document...I just sort of gave up on that and conceded that I could live with a few 5000 pagers. I think it may be possible to achieve the single large document though if you have some time to keep playing with it (which I didn't).

The only improvement (regarding the memory issue) that I found was to first concatenate a smaller number of the files into a few large files, then concatenate those large files into an even larger file (and you could do several iterations of this).

It doesn't make much sense to me, but it at least worked better. It was slow, and I think there is a threshhold at which it will still give an out of memory error (not sure what that threshhold is). But at least this got me a little further along.

Let me know if you discover any other helpful hints.

timg · July 25, 2007, 11:27am

Forgot to mention....the other thing that should help your "Out of Memory" issues is to use the overload of the Concatenate function that is

Concatenate(string[] inputFiles, string outputFile)

instead of

Concatenate(stream[] inputStreams, stream outputStream).

From your code above (if I'm reading it correctly), it looks like you could use your tempFiles List and just do

Concatenate(tempFiles.ToArray(), @"c:\stitchedDocument.pdf")

That way, you just have a list of strings loaded into memory instead of a list of PDF streams. Hope that helps...

markshiffer · July 26, 2007, 1:05pm

Any Aspose input on this issue?

Thanks.

AdeelTaseer · July 26, 2007, 4:14pm

Hi,

Well, I can share my experience. I used concatenate to concat hundreds
of files to one Pdf File with in seconds. I haven’t used Stream but
instead file names in the String and that worked for me. It is still
difficult to say which one is better, but I think that the performance
is better if you give string Files inside the concatenate method as Tim mentioned in earlier post.

Thanks.

markshiffer · July 30, 2007, 7:18am

Unfortunately, neither method provides satisfactory results. The memory footprint is too large to be used. My test scenario has a single-page 65KB document that when concatenated 1000-times should be producing a roughly 65MB document, but Aspose explodes the memory to over 1.2 GB and out of the process memory limit when performing the concatenation.

It is quite possible that in my production envirnoment I will have much larger requirements for the concatenation.

Can you try the example that I posted earlier to see if there is an existing way around this problem, and if not, open an official bug for this?

Thanks.

forever · July 30, 2007, 9:02am

Hi,

I am afraid we can’t improve this in short time. You can try saving temporary files to disk and concatenating few files each time. But this may also fail when concatenating several large files at the end. Sorry for the inconvenience.

gfrey · December 13, 2007, 1:21pm

We just ditched ActivePDF.Toolkit for Aspose.PDF.Kit because of some issues we were having with ActivePDF such as 64-bit compatibility. But now we are unable to merge 200 PDFs into one PDF without experiencing a OutOfMemoryException. We were able to do it ActivePDF. Here is the error thrown:

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown. at Aspose.Pdf.Kit.PdfFileEditor.Concatenate(String[] inputFiles, String outputFile)

Has anything been done to fix Concatenate? It doesn’t seem that much to ask to merge 221 PDFs ranging in size of 500KB-700KB into one PDF???

Thanks!
-Almost Happy Customer

gfrey · December 13, 2007, 5:36pm

So much for the theory of Concatenating into subsets then to a larger file. Like I stated before I have 221 PDFs to merge so I broke it down to merge into 8 temp PDFs each containing 25 of the original PDFs and the last PDF containing the remainding 21 original PDFs for a total of 9 temp PDFs. Those produced without a hitch. Now I tried to merge the 9 temp PDFs (each are around 14MB) and it bombs with the System.OutOfMemoryException. Man this is a huge bummer!!! Any ideas or workarounds???

AdeelTaseer · December 13, 2007, 7:56pm

Hi,

Have you checked with latest version of Aspose.Pdf.Kit? Could you please send us one of your Pdf and I will try to reproduce this error by just making replica's of those PDFs.

Thanks.

markshiffer · December 14, 2007, 6:56am

We’ve been dealing with this problem for the past 6 months without much progress. See my posts from back in July. With each new release I check the release notes and hope that it has been addressed, but unfortunately it has not. If you come to a resolution on this, please post here so that we can try to implement it. This one is really causing us headaches.

forever · December 14, 2007, 7:26pm

The large Pdf support issue is logged as PDFKITNET-4041 but we have not resolved it. I have set the priority to high and we will try to resolve this issue ASAP. I will notify you in this post when we make progress. Sorry for the inconvenience and thanks for your patience.

gfrey · December 17, 2007, 2:53pm

<span style=“font-size: 11pt; font-family: “Calibri”,“sans-serif”;”>I had to purchase DynamicPDF Merger to resolve my merging issues. It is much faster than Concatenate (which is probably due to the memory issues it has) and it was able to handle our 220 pdfs into one 120MB pdf no problem. I hope Aspose fixes Concatenate but I won’t require a fix for a year since I had to purchase this software with 1yr support to resolve our production Concatenate issue.

Felix.Liu · January 18, 2008, 2:03am

Hi,

We have made some progress in the issue PDFKITNET-4041. Please try the attachment, and note that only the methods PdfFileEditor.Concatenate with string parameters are optimized in this Beta, and another issue is that the amount within 200M is recommended.

I should say it's a real challenge for us to improve the performance of big-file-merge without losing data such as bookmarks and acroforms. Anyway, we developers will go on researching on and improving the function, as well as other performance issues in our product.

Best regards,

belliv · February 13, 2008, 3:35am

Hi,

We had the same problems here since many weeks ...

It works great with the version you provided in attachment, I tried to merge 19 pdf 20MB each , it tooks less than one minute on our server. I also tried to merge 2 pdf 200MB each and it worked the same way.

one question about this version, it does not require any license file ? 'cause I forgot to copy aspose.custom.lic and it worked , is there any limitation ? Also is this DLL quite stable to use on a production site or does we have to wait for a official release ? (and when would it plan to be released)

Regards,

Felix.Liu · February 13, 2008, 5:36am

Hi,

Very happy to know that the Beta works well in your real server.

We sent the beta for test purpose, and your feedbacks are greatly appreciated. The stable version will be released later in this month.

Best regards,

fsaatlas · December 21, 2008, 11:44pm

Did this fix ever make it to one of the release? We’re currently using 3.1.0.0 and we’re having problems with huge memory consumption issue as well…

Please advise. Thx

Felix.Liu · December 22, 2008, 12:44am

Hi,

Please download the latest release(v 3.3.0.0) from our downloads page. We have further optimized the performance for PdfFileEditor class in this release. Please try it.

Thanks,