Aspose.PDF memory leak issues while merging multiple pdf files and adding page number, logo?

PrathapSV · October 17, 2022, 11:23am

Hi,

We are trying to replace our existing QuickPDF library with Aspose.PDF.

We are evaluating the Aspose.PDF(22.6.0.0) and it looks promising so far with our testing, but just found a major memory leak issue. We have reproduced this issue and attached the sample project here with.AsposePDF_Sample_MemoryIssue.zip (50.9 KB).
Please review.

Please use the below link to download all the pdf files used for merge operation and also you can download the .mp4 video where we have recorded the issue.
https://drive.google.com/drive/folders/1_Ltdq66tTjPcaUxiX0PgZ2eK6Gv9y2Sg?usp=sharing

Please review and provide us a fix/let us know if we are not doing something correctly, as this will be a huge issue and can become showstopper for us to go with Aspose.PDF.

Earlier we were seeing >950mb of memory was not getting released, then we used FreeMemory() functions, after which we saw some improvements, memory leak was reduced to 350mb from 950mb. But still 350mb will be a huge memory issue for us, since in sample we are merging just 500 files, in our production it could go up to 3000 files. So, it will
create huge issue for us.

Note:

Packages folder is been removed to reduce file size.

Steps to reproduce:

Please download the “Aspose PDF Memory Issue.mp4” from the link and watch the video to understand the issue.
Download and copy all the files and folders of “PDF.zip” from the link and place it in your file system.
Run the project and then map the path to this folder in the textbox shown in the form.
Then click on the “Generate PDF Book” button.
Track the memory jump until the merge completes.

If it helps, here is a break down of our workflow to create a pdf book.

Create a temporary pdf and then merge all the pdf report files into this temporary pdf file.
Add the page number and logo to this temporary pdf file.
Copy this temporary pdf file into one more new pdf file.
Set the OptimizationOptions to reduce the file size.
Dispose all the objects

Please review and let us know.

Thanks,
Prathap

asad.ali · October 17, 2022, 6:07pm

@PrathapSV

We were able to notice that not all memory was released once the program ended concatenation operation while testing the scenario using 22.9 version of the API. Therefore, an issue as PDFNET-52765 has been logged in our issue tracking system. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

PrathapSV · October 18, 2022, 5:33am

Thank you for the update.
We are planning to release this implementation in a couple of weeks. Can you please provide an ETA on this, so that we can plan our release accordingly?
Also, can you please review the code and let us know, whether we are following proper standards for disposing Aspose.PDF objects?

Regards,
Prathap

asad.ali · October 18, 2022, 3:37pm

@PrathapSV

We are afraid that we cannot share any ETA at the moment. The ticket has just been logged in free support model and here issues are resolved on a first come first serve basis. The resolution time of the issue depends upon number of issues logged prior to it. Moreover, we will surely have deep look into the code snippet and share our feedback with you in case we find some improvements can be made. Please spare us some time.

PrathapSV · October 20, 2022, 5:21am

Thank you for the update.

We have couple of questions.

Is there a way to up the priority for this issue, for ex if we buy the license, then can your development team could look into this issue asap? Also, i see paid support option, how much it would cost or will it comes once we buy license?
We tried using GarbageCollector functions as shown below to free up memory after creating the book. It’s working so far and we were able to reduce the memory from 500mb to 135mb.
GC.Collect();
GC.WaitForPendingFinalizers();

In the sample, we are calling these two lines of code to free up resources at 3 places as shown below.

Before we set “CurrentPdfPage” object which is used later to add logo and text.
After completing all the operations on document object.
At the end of book creation process.

You can use the same lines of code(GC functions) to clean up resources in below functions and see for yourself.

BookPDFManagerAspose=> private void DisposeAsposePDFPageObj(Aspose.Pdf.Page asposePDFPageObj)
2.BookPDFManagerAspose=>private void DisposeAsposePDFDocumentObj(Document asposePDFDocumentObj)
BookPublisherAsposePDF=> public void MergeFilesAndSetPageNumbersAndLogo()

But we are not sure about this approach as we need to call this GC functions multiple times, for each page where we draw text/logo image and for all document instances that we create.
(ex: 500 pages book => 500 times + approximately 10 document instances creation => 10 times => so overall GC.Collect() and GC.WaitForPendingFinalizers() will be called 510 times for this use case)

So, we really need your team insights here, regarding this approach we are doing here by calling GC functions, would it create any adverse affects (impact overall application health or increases any time complexity etc) OR will it be safe to do so?

Please advise.

Thanks,
Prathap

asad.ali · October 20, 2022, 3:00pm

@PrathapSV

Yes, we do offer paid support option and it can be used to prioritize an issue or ticket that is a blocker at your side. Issues in paid support have the highest priority. However, it does not come with the license subscription. You need to subscribe to paid support separately. Further pricing information can be obtained by creating a post in our Purchase forum.

Regarding your other concerns about the memory consumption and memory leak, we need to dig further in the first place that why this memory leak is occurring and why it is even needed to use Garbage Collection. We have recorded every details along with the ticket and will surely investigate the ticket from this perspective and let you know about our feedback once ticket is resolved. Please spare us some time.

We are sorry for the inconvenience.

PrathapSV · October 21, 2022, 12:31pm

Thanks for the update.
We have observed something with memory jumps in our testing.

If we re-run the book(merging of pdf files, adding logo and page number etc), only for the first time we are able to see the increase in memory. After 2nd time run we don’t see memory getting stacked, even if we run the same process multiple times. Because of this we suspect whatever the memory getting leaked will not stack up when we run the same process multiple times. So in a way it is acting as cache memory. Let us know if we are correct with our assumption. If not, could you please explain this behavior?

Also, how much memory leak (size in MB) do you see at your end?

Regards,
Prathap

asad.ali · October 21, 2022, 8:43pm

@PrathapSV

We observed 230MB memory remained stacked at our end. However, your observations are correct. The performance of the API can correctly be measured by subsequent run and also in release mode. At the time of first run, API loads necessary resources into memory like system fonts. Due to which the execution time for the first run is also more than of the subsequent runs because on next runs, API accesses already loaded resources in the memory.

PrathapSV · October 25, 2022, 6:12am

Thank you for the update.

Will be waiting for your team to get back on this issue in detail.

We spoke to our management regarding “Paid Support” option to up the priority for this issue, they are okay to purchase the Aspose.PDF license along with paid support option but since there is an existing issue here, they don’t want to buy it now itself, their point is what if the issue didn’t get resolved or takes more time, will be wasting the money as we can’t release this “Aspose.PDF” into our production with a memory leak defect.

Also, we are an existing customer of Aspose, been using Aspose.Cells and Aspose.Slides from past 2-3 years, you can check the same with name “Investment Metrics” in your customers list. Please do your best to resolve this issue as soon as you can, as this is the only issue holding us back to release into our production so far.

Thanks,
Prathap

asad.ali · October 25, 2022, 4:14pm

@PrathapSV

Sure, your concerns have been recorded and will surely be honored during ticket investigation. We will surely inform you as soon as we have some updates about ticket resolution.

clydeu · November 29, 2022, 2:17am

We are also experiencing similar problem to this when using Aspose.Pdf.Document. In the image I have attached, you can see the difference in number of objects, heap size and the object types. As you can see, there are a lot of Aspose PDF objects being left around in the memory.

This memory profile was captured after implementing the work around of the sample project (https://forum.aspose.com/uploads/default/69007) with the following code changes:

Aspose.Pdf.Document.FreeMemory()
GC.Collect()

Thanks,
ClydeReporting service memory leak.png (310.1 KB)

PrathapSV · November 29, 2022, 4:13am

Thank you for the update. But it missing lot of things and we are not able to understand the solution too.

We are also experiencing similar problem to this when using Aspose.Pdf.Document. In the image I have attached,

We don’t see any attachments.

This memory profile was captured after implementing the solution suggested in this thread with the following code changes:

We don’t see any links to any thread.

We didn’t understand the solution here. Can you please elaborate a bit more? Are you suggesting that we should use below functions to resolve this issue? If yes, then can you please provide us more details (like where to use, how many times to call etc). Could you please attach the sample project where you have implemented this solution?

Aspose.Pdf.Document.FreeMemory()

GC.Collect()

asad.ali · November 29, 2022, 8:45am

@PrathapSV @clydeu

We request you both please try to use 22.11 version of the API a lot of memory improvements have been made in this version. Please share your feedback after testing the case using this version.

PrathapSV · November 29, 2022, 1:59pm

Thank you for the update.

Unfortunately we don’t see much improvements after trying the same test case with latest version 22.11, as you suggested. Below is the screenshot FYI.
Screenshot (188).png (221.5 KB)

We merged 1600 pdfs and observed the sample project’s process is nearly consuming around 1.2GB of memory after merge process is done, which is huge for us. But as we said earlier, when we run the same merge operation for 2nd time, it didn’t append the memory again(around 1.3GB after 2nd merge operation). But we are concerned on the part that it takes more memory proportional to the input size. So, if our clients run >5000 pdfs for merge operation then will get into trouble by getting out of Memory exception. We guess, you have to find a way where you can release all these memory at the end of the merge operation/creating book.

Please review and let us know.

Regards,
Prathap

asad.ali · November 29, 2022, 9:06pm

@PrathapSV

Thanks for sharing your feedback. We have recorded your response and updated the ticket information accordingly. We will further investigate this behavior of the API and let you know as soon as we make some improvements. Please spare us some time.

clydeu · November 29, 2022, 11:06pm

I have attached image in original message now.

Referring to the work around that you had implemented in your sample project.

clydeu · November 29, 2022, 11:19pm

@asad.ali our license covers upto 2022.01 version. Do you have temporary license I can use for testing 22.11 version?

PrathapSV · November 30, 2022, 6:17am

Here you go. It will expire by 2022-Dec-03.
Aspose.PDF.Product.Family.zip (803 Bytes)

asad.ali · November 30, 2022, 5:10pm

@clydeu

You can obtain a temporary license by visiting the attached link and if you face any issues, you can contact our Sales team in Purchase forum.

clydeu · November 30, 2022, 11:40pm

@PrathapSV file is private.