Merging multiple pdf Document objects into a single Document object

williamfa · March 2, 2022, 6:59pm

I’m using c# and retrieve multiple pdf files from an API represented by byte arrays.

If I start with multiple byte arrays, how do I combine them in to a single pdf document using Aspose?

I’ve tried many, many variations of code to try to combine pages from multiple Document objects but that works poorly. Either the result has no pages or I get a closed stream error. It seems they’ll work if I save to a file first and then combine the files but that’s not a reasonable solution.

Any assistance would be appreciated. Apparently, I really don’t understand how to use the Document objects when there is more than one at a time being used in the same scope. If I save a Document object it disposes of the streams which are needed to use the contents but if I don’t then there will be no contents.

Edit: I think this will work:

... documents = Dictionary<string,byte[]> with each entry holding the contents of a single pdf file

using ( var ms = new MemoryStream() )
{
    var pdfDocument = new Document();
    using ( var pdfResult = new Document() )
    {
        foreach ( var document in documents )
        {
            using ( var pdfStream = new MemoryStream( document.Value ) )
            {
                pdfDocument = new Document( pdfStream );
                pdfResult.Pages.Add( pdfDocument.Pages );
            }
        }
        pdfDocument.Dispose();
        if ( pdfResult.Pages.Count == 0 )
        {
            results.Error = "Failed to merge pdf contents";
        }
        else
        {
            using ( var pdfStream = new MemoryStream() )
            {
                pdfResult.Save( pdfStream, SaveFormat.Pdf );
                results.Bytes = pdfStream.ToArray();
            }
        }
    }
    ms.Close();
}

asad.ali · March 2, 2022, 8:47pm

@williamfa

Have you tried the approach given in the below documentation article?

Concatenate Array of PDF Files Using Streams

Please try it pass the array of streams containing PDF files to obtain single output PDF and if you still notice any issues, please let us know.

williamfa · March 3, 2022, 1:23pm

Hi Asad, I did see that article, but using a method with a fixed number of streams isn’t practical in this case. There could be dozens of files to merge together as part of a document packet attachment. The code I suggested does work for me where I put the byte arrays in a dictionary/list and then handle the streams and disposal like I do above.

That said, working with multiple Document objects simultaneously didn’t work particularly well. It seems they share common stream objects and saving/finalizing one Document affects others which are already instantiated. In the above example, I reused the same document object pdfObject without ever saving/finalizing it until I was done merging and that was basically my workaround for these problems.

asad.ali · March 3, 2022, 8:26pm

@williamfa

You are right about document loading and how it accesses the common stream. However, this is the expected behavior of the API and we cannot mark it as a Bug. The API follows DOM (Document Object Model) approach in which the whole document loads into memory and remains there until the Dispose() or Save() method is called. Also, if the Document is using some resources from other document objects, disposing of those documents will cease the access of resources for the main Document object.

The API process every initialized and used resource into the memory at the time of document saving maps everything into a single PDF document when it is being saved.

You can still use this method by initializing an array of the size of the same dictionary in which you are collecting all PDF documents bytes. That way you would not need to adopt the workaround that you discovered. Please feel free to let us know in case you have further concerns.

williamfa · March 3, 2022, 8:44pm

Ok, I’ll try the concatenation of an array of MemoryStream objects and see how that works, thanks!

asad.ali · March 3, 2022, 11:40pm

@williamfa

Sure, please take your time to test the case and feel free to let us know in case you face any issues.