OutOfMemoryException When Generating Previews for Large PDF

Hi Aspose Support,

We are encountering an OutOfMemoryException when using AsposePDF to generate a preview for a PDF document with around 416 pages.

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
   at System.Collections.Generic.List`1.Add(T item)
   ...
   at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
   at Aspose.Pdf.Page.Accept(TextAbsorber visitor)
   at Aspose.Pdf.PageCollection.Accept(TextAbsorber visitor)

Could you please clarify:

  • What are the recommended or required system memory specifications (e.g., RAM per page, heap size, etc.) for generating previews of large PDF files using AsposePDF?
  • Are there any known limitations or best practices for handling high-page-count documents when rendering previews?

Thanks in advance for your help.

@jdon

Cause

The OutOfMemoryException you’re encountering when generating previews for a large PDF document is typically due to insufficient memory allocation for handling large files. This occurs when the application attempts to load or process a document that exceeds the available memory resources.

Solution

To address this issue, consider the following recommendations:

  1. Increase System Memory: Ensure that your system has enough RAM. A general guideline is to have at least 1 GB of RAM for every 100 pages of the PDF, but this can vary based on the complexity of the content.

  2. Optimize Memory Usage: Utilize memory optimization settings provided by Aspose.PDF. Adjust memory settings to reduce the memory footprint when processing large documents.

  3. Process in Chunks: If possible, process the PDF in smaller segments rather than loading the entire document at once. This approach can help manage memory usage more effectively.

  4. Use 64-bit Environment: If you are currently running a 32-bit application, switch to a 64-bit environment, as it can handle larger memory allocations.

Code

Here’s a sample code snippet that demonstrates how to set memory preferences when loading a PDF document:

using Aspose.Pdf;
using Aspose.Pdf.Text;

// Load the PDF document
Document pdfDocument = new Document("path/to/your/document.pdf");

// Set memory optimization settings
pdfDocument.MemorySetting = MemorySetting.MemoryPreference;

// Generate preview or perform operations
TextAbsorber textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);

Next Steps

  1. Review your system’s memory specifications and consider upgrading if necessary.
  2. Implement the memory optimization settings in your code as shown above.
  3. If the issue persists, break down the PDF processing into smaller tasks or segments.

Please let me know if you need further assistance or clarification!

Thank you for the suggestion and detailed explanation.

After reviewing the official documentation, I found that the MemorySetting.MemoryPreference property is not available in Aspose.PDF, but rather part of Aspose.Cells:
:link: Public API Changes in Aspose.Cells 8.0.1|Documentation
:link: Class Document | Aspose.PDF for .NET API Reference


Original Strategy:


Updated Strategy:

  • Switched to chunked text extraction:
    • Each page is processed individually with a new TextAbsorber instance.
    • This prevents high memory usage from accumulating content across pages.
    • Also added OptimizeResources() for improved memory efficiency during processing.

This approach has noticeably reduced peak memory usage when generating previews and performing language detection on large PDFs.

The fix is now under testing—no further input needed at this point. Thanks again!

@jdon

Its nice to know that you were able to sort out the issue. Please keep using the API and feel free to inform us by creating a new topic in case you face any kind of issues.