PdfFileEditor.Extract memory issues?

I am having problems with OutOfMemory exceptions performing a series of operations on PDF files that are large in file size. If taking a closer look at what is happening, it seems that when the Extract method is called, the memory consumed doesn't get released when the extract finishes. Multiple calls to extract, and various other methods in PdfFileEditor is causing OOM exceptions.

Is there any way to force the memory to be released?

Hi Brian,

Are you using Aspose.Pdf.Kit for .NET? Please try to use Aspose.Pdf.Kit 3.4. If you still find the problem, then please share the PDF file with us, so we could test at our end and suggest you some resolution.

We're sorry for the inconvenience.

Regards,

Yes to .Net. I wasn't using the released 3.4 version, but tried this morning with that dll and still no luck.

The order of events is:

1) Extract is called. Memory spikes.
2) Insert is called.
3) Delete is called
4) Extract is called. Memory goes higher.
5) Insert is called. OutOfMemory Exception.

It seems like the memory usage is slightly different with the newer version of PDF.Kit, but it is still not releasing large amounts of memory after a call to any of the functions above. I tried uploading a sample PDF twice and it failed both times. The file is 85MB and is 100Pages in length. It is an all image based PDF.

Hi Brian,

Can you please try the following code by calling it after the extract method? I hope this will help reduce the excessive memory consumption.

using Microsoft.Win32;

using System.Runtime.InteropServices;

public class MemoryManagement

{

[DllImportAttribute("kernel32.dll", EntryPoint = "SetProcessWorkingSetSize", ExactSpelling = true, CharSet = CharSet.Ansi, SetLastError = true)]

private static extern int SetProcessWorkingSetSize(IntPtr process, int minimumWorkingSetSize, int maximumWorkingSetSize);

public static void FlushMemory()

{

GC.Collect();

GC.WaitForPendingFinalizers();

if (Environment.OSVersion.Platform == PlatformID.Win32NT)

{

SetProcessWorkingSetSize(System.Diagnostics.Process.GetCurrentProcess().Handle, -1, -1);

}

}

}

If problem persists, please do let us know.

Regards,

Hello, I know this is not my reported problem, but it seems related to my .SplitToBulks call problem I recently reported (PDFKITNET-8129),

I tested the .Extract call with the FlushMemory call with a 102MB pdf file.

Without the .FlushMemory call I quickly get an OutOfMemoryException with the memory foot print being easily 1Gb

With the .FlushMemory call my test app footprint goes up to about 540MB quickly decreases to 15MB during each extract call loop - SO the FLUSHMEMORY call helps a lot

Unfortunately, the .SplitToBulks does not let me put in a .FlushMemory call in the processing.

Any news on ISSUE (PDFKITNET-8129) ?

Hi,

The issue with the ID PDFKITNET-8129 is resolved and a hotfix can be downloaded from this link.

Regards,

I tried v 3.4.1.4 of PDF.Kit and it had no impact.

Hi Brian,

I'm afraid that this extensive memory consumption issue can be completely resolved in short time. Our team will be looking further into the issue, and you'll be updated as the issue is resolved.

Regards,

Brian, I am not an expert as the Aspose Team are but I've noticed the size of the image affects the performance of the sdk,

for example a typical page (8.5x11) at 200dpi is about 2200px width x 1700px height

Once i added (using abobe acrobat pro) a 11x17 (image) (~3500px x 2200px ) page to a test pdf with 12 or 13 8.5 x11 pages and I had memory problems dealing with that large image page in my code.

Is there any update on the memory issues here? This is becoming increasingly problematic.

Hi,

We are working on the issue and will try to enhance the features of extracting/spliting pages again. And we will inform you in this thread when the performance of the features could be improved.

Thanks,

Hi,

We have a problem with the memory usage of the extract method from PdfFileEditor. In the 4.3.0 version we notice improvements in the memory usage of the Concatenate method, but no improvement in the memory usage of the extract method.

Below some test results from the different versions:


Concatenating 70 pdf pages about 15 mb per page (in total 1025mb) goes well in the 4.0.0 verion and in the 4.3.0 version, but with more pdf pages the 4.0.0 version gets out of memory and the 4.3.0 version works well till at least 100 pdf pages (in tot 1500mb)

The extract method gets out of memory in the 4.0.0 and 4.3.0 version at splitting more then 70 pages (1025mb), so 70 pages is the maximum amount of pages that can split in these versions.

What is the best way to split large pdf files? Is possible to improve the memory usage of the extract method?

Sincerly,

Wouter


Hi Wouter,

I have noticed that you’re having problem with Extract, Split and Concatenate features of the PdfFileEditor class. Can you please share the problematic PDF files along with the details that which files are causing problem with which particular feature? I would like to share that our team keep trying to improve the memory utilization and performance, however the situation might be different for particular type of files. So, we’ll have to test the issue using your particular scenario. You’ll be updated with the results the earliest possible.

We’re sorry for the inconvenience.
Regards,



Hi,


The Extract method of the PdfFileEditor class causes the memory problem. I can’t share the pdf files cause the pdf files contain sensitive information, but size and page information of the pdf file are in the post above.

Kind regards,

Wouter

Hi Wouter,

As I have already shared that our team keeps trying to improve the performance and memory utilization, however it is very important for us to test the issue using your particular scenario, so that we’ll be focusing on your particular issue. If you can share a sample PDF file, without the confidential information, which can reproduce the issue at our end that will help us diagnose the problem and fix it.

We’re sorry for the inconvenience and appreciate your cooperation.
Regards,