Pdf according to the number of pages split save OOM

David_Matin · January 20, 2025, 9:27am

Use the following code to split the pdf, in the split to the 66th page, the progress stalled, while monitoring found that the node memory use more than 18G, 5 minutes later, prompted OOM, the node memory limit of 30G ’ document = ap.Document(file_path)
for pdfPage in document.pages:
new_document = ap.Document()
new_document.pages.add(pdfPage)
logger.info(“pdf拆分,第{}页”, page_count + 1)
new_document.save(f"{result}_output/{file_name}_page{page_count + 1}.pdf")’ aspose-pdf==24.10.0
TBSA-金字塔原理：思考、寫作、解決問題的邏輯方法-2011.4.6.pdf (2.3 MB)

Professionalize.Discourse · January 20, 2025, 9:27am

@David_Matin

It seems that you are encountering an Out of Memory (OOM) issue while trying to split a PDF document into individual pages using Aspose.PDF. This can happen when handling large documents or when the memory usage exceeds the limits set for your environment.

To address this issue, consider the following suggestions:

Process Pages in Batches: Instead of loading the entire document into memory at once, you can process the pages in smaller batches. This will help reduce memory consumption.
Dispose of Unused Objects: Ensure that you are properly disposing of any Document objects that are no longer needed. This can help free up memory.
Use a Stream for Saving: Instead of saving each page to a file directly, consider using a FileStream to write the output. This can sometimes help manage memory better.

Here is an example of how you might implement these suggestions in C#:

using System.IO;
using Aspose.Pdf;

namespace Aspose.Pdf.Examples.CSharp.AsposePDF.Pages
{
    public class SplitToPages
    {
        public static void Run()
        {
            string dataDir = "path_to_your_pdf_directory/";
            Document pdfDocument = new Document(dataDir + "your_large_pdf.pdf");

            for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
            {
                using (Document newDocument = new Document())
                {
                    newDocument.Pages.Add(pdfDocument.Pages[pageCount]);
                    using (FileStream fs = new FileStream(dataDir + $"page_{pageCount}.pdf", FileMode.Create))
                    {
                        newDocument.Save(fs);
                    }
                }
            }
        }
    }
}

In this code, each page is processed and saved in a using block, which ensures that resources are released promptly. Adjust the dataDir and file names as necessary for your environment.

If the problem persists, you may want to check the memory limits of your environment and consider increasing them if possible. Additionally, ensure that you are using the latest version of Aspose.PDF, as performance improvements and bug fixes are regularly released.

If you continue to experience issues, please provide more details about the size of the PDF and the environment you are working in, as this information can help in diagnosing the problem further.

David_Matin · February 5, 2025, 3:31am

@ asad.ali

asad.ali · February 5, 2025, 2:17pm

@David_Matin

We have tested using Aspose.PDF for Java 24.12 and could not notice any issue in our environment. Would you please make sure to try with the latest available version and if issue still persists, please share your environment details and memory consumption screenshot with us. We will further proceed accordingly.

David_Matin · February 7, 2025, 1:25am

@asad.ali using Aspose.PDF for Python via Net in K8s,version is 24.12.0, os:Ubuntu

asad.ali · February 7, 2025, 8:15am

@David_Matin

Can you please share the complete stack trace information and error message for our reference? We will log an investigation ticket and share the ID with you.

David_Matin · February 10, 2025, 1:22am

@asad.ali Unfortunately, there are no anomalies. I’ve set the memory limit for containers in k8s to 10G, and beyond that, k8s will automatically restart the container.

asad.ali · February 10, 2025, 1:52pm

@David_Matin

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFPYTHON-343

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.