Merging pages from multiple source documents

JonKnight · June 26, 2013, 6:41am

Hi,

I’m currently evaluating Aspose.Pdf, and would appreciate your help in working out the code I need to satisfy a specific scenario. A requirement we have is for a document assembly application. I therefore need to pull specific pages from a number of source PDF files, and combine them into a new destination PDF file.

To illustrate this, imagine that there are 10 PDF files, and I want to pull page 1 out of each of them and create a new PDF containing all of the page 1s. (It could be 20 files rather than 10, the point being that the number of files is determined at runtime, and that the processing needs to be in a loop, so I can’t keep all of the source files open until the destination file has been written out.)

I’ve tried opening the source files in turn, adding the page in question to the destination file, and then saving, but at the point that the destination file is saved the page that I’ve added from the source files has already been disposed.

Is there some way of taking a copy of a Page object (rather than referencing the instance from an open document) that I’ve missed? I’ve tried searching through the code samples, but they all seem to be for a simpler scenario, with a fixed number of source files.

Thanks.

codewarior · July 3, 2013, 3:06am

JonKnight:
I’m currently evaluating Aspose.Pdf, and would appreciate your help in working out the code I need to satisfy a specific scenario. A requirement we have is for a document assembly application. I therefore need to pull specific pages from a number of source PDF files, and combine them into a new destination PDF file.

Hi Jon,

Thanks for using our products.

Please follow the instructions specified over “How to Concatenate PDF Files”.

JonKnight:
To illustrate this, imagine that there are 10 PDF files, and I want to pull page 1 out of each of them and create a new PDF containing all of the page 1s. (It could be 20 files rather than 10, the point being that the number of files is determined at runtime, and that the processing needs to be in a loop, so I can’t keep all of the source files open until the destination file has been written out.)

You may consider loading/reading all the PDF files from particular directory and get the pages from individual file while iterating through all the documents. You may also check Concatenating all Pdf files in Particular folder

JonKnight:
Is there some way of taking a copy of a Page object (rather than referencing the instance from an open document) that I’ve missed?

In order to get the page reference, first you need to open the source PDF file. I am afraid currently Aspose.Pdf for .NET does not support the feature to get page instance without opening/loading the source PDF file.

In case of any further query, please feel free to contact.

PS, using Document class to concatenate/manipulate PDF files is a recommended approach.

JonKnight · August 8, 2013, 6:32am

Thanks for the information.

For anyone trying to do something similar, I found that the Aspose.Pdf.Facades.PdfFileEditor worked best for me. Here's a code snippet:

                    List pages = new List();

                    foreach (PdfPageItem item in NewPdfPages)
                    {

                        Aspose.Pdf.Facades.PdfFileEditor inputEditor = new Aspose.Pdf.Facades.PdfFileEditor();
                        MemoryStream pageStream = new MemoryStream();
                        using (FileStream input = new FileStream(item.PdfFileName, FileMode.Open))
                        {
                            inputEditor.Extract(input, new int[] { item.PageNumber }, pageStream);
                        }
                        pages.Add(pageStream);

                    }

                    Aspose.Pdf.Facades.PdfFileEditor outputEditor = new Aspose.Pdf.Facades.PdfFileEditor();
                    using (FileStream outputStream = new FileStream(outputFileName, FileMode.CreateNew))
                    {
                        outputEditor.Concatenate(pages.ToArray(), outputStream);
                    }

                    foreach (MemoryStream item in pages)
                    {
                        item.Close();
                    }

Note that the PdfPageItem class is one that contains 'PageNumber' and 'PdfFileName' properties that are referenced in the code above.

This basically creates an array of memory streams, one for each page from the source file(s), and then uses the PdfFileEditor Concatenate method to output them to a file. If you need to change the PDF version, the only way I found was the open the newly created file in a PdfDocument, use 'Validate' method to set the PDF version, and then save it again.

If anyone has an easier way to achieve the result of the above code, please let me know!

codewarior · August 13, 2013, 8:34am

Hi Jon,

Thanks for sharing the feedback.

Yes you are correct. In order to upgrade/downgrade PDF file version, validate(…) method of Aspose.Pdf.Document class is used. Furthermore, in order concatenate PDF files, I would recommend you to please try using Document class which has better performance as compare to PdfFileEditor. For further information, please visit Concatenate PDF Files

In case of any further query, please feel free to contact.