Limiting memory usage when concatenating large PDFs

I am trying to concatenate two large PDF documents together using Aspose.PDF (23.1) in Java like this:

Document doc1 = new Document("D:/input1.pdf");
Document doc2 = new Document("D:/input2.pdf");
for (final Page p : doc2.getPages()) {
    doc1.getPages().add(p);
}
doc2.close();
doc1.save("D:/output.pdf");
doc1.close();

The documents in question each are around 600 MB in size and have 8000 pages with a large image on each page and a small amount of header/footer text. I can provide the actual files if required.
The whole operation is done in a process that only has 512MB of Memory which leads to an OOM when I try it.
Is there a way for this operation (simple concatenation, no further modification of the content) to be done with minimal memory consumption, e.g. using temp files?

@uballach

Larger PDF files need high memory to get processed. In case you have memory limitations, you can try creating multiple copies of the PDF documents having 10-20 pages per document. Once they all are saved on some physical path, you can loop through all files in the temp directory and call doc.getPages().add(sourceDoc.getPages()) method. In case issue still persists, please let us know.

Thank you for the reply!
Unfortunately, that did not help me. I tried to split it up in 20, 10 or 1 page documents and add them in a loop. In all three cases, the process goes OOM at around page 5000.

@uballach

Can you please share your sample files and complete code snippet that you have tried so far? We will open an investigation ticket in our issue tracking system and share the ID with you.

@asad.ali

This is a sample one-page document similar to the pages in the actual files:
1page.pdf (77.6 KB)

I can reproduce the issue with that file using the following Java code:

Document doc1 = new Document("D:/1page.pdf");
for (int i = 0; i < 10000; i++) {
  Document doc2 = new Document("D:/1page.pdf");
  doc1.getPages().add(doc2.getPages());
  doc2.close();
}
doc1.save("D:/aspose-out.pdf");
doc1.close();

Note that the JVM is launched with the -Xmx512m switch.

@uballach

An investigation ticket as PDFJAVA-42650 has been logged in our issue tracking system for the sake of further analysis on this scenario. We will look into its details and keep you posted with the status of ticket resolution. Please be patient and spare us some time.

1 Like

Hi, have you been able to look into the issue in the meantime?

@uballach

We are afraid that we were not able to resolve the earlier logged ticket due to other issues in the queue. As soon as we make some progress towards ticket resolution, we will inform you via this forum thread. Please spare us some time.

We are sorry for the inconvenience.