High Memory usage of TextFragmentAbsorber when setting license

jkschanke · June 11, 2018, 4:56pm

Hi,

I’m using the TextFragmentAbsorber to replace text in a pdf document and I’m seeing some very odd behavior in relation to memory usage. I have a 13 MB PDF file with 1 page of text to verify the replace and 64 pages of images. The file doesn’t really matter, but it was one where this behavior was obvious. I am on the latest Aspose.PDF version(18.6).

When I use the TextFragmentAbsorber without setting a license file it runs quickly and uses less than 100MB of memory. When I set a license file, it runs slower and uses 1GB of memory. The memory steadily rises when calling pdfDocument.Pages.Accept(textFragmentAbsorber).

I uploaded a .zip with a console app that can reproduce this in my google drive: https://drive.google.com/open?id=1ACvqGPUS6JWeNo_pq-owSpQwfILyXhgb

Repro steps:

Download the zip and open the .sln
Restore the nuget packages
Run it once without setting a license file
** Notice that the replace worked and that memory usage never went above 100MB
Add a license file and uncomment the SetLicense lines
Run it and notice that memory climbed to 1GB

I have looked through the forums and API documentation to see if there was something I was missing but wasn’t able to find anything.

Thank you for any help you can offer,

-Josh

Farhan.Raza · June 11, 2018, 8:01pm

@jkschanke

Thank you for contacting support.

We have worked with the data shared by you and have been able to notice the scenario mentioned by you. However, this does not appear to be a bug with the API. When the license is not set, the API has several evaluation limitations like you can process only 4 elements of any collection. That means only 4 out of 64 pages are loaded, same applies to other collections involved. When the license is applied, whole Document Object Model (DOM) of Aspose.PDF API is loaded into memory and your PDF file contains a lot of images, therefore about 1 GB of memory is consumed while working with this document.

We hope this will clarify any ambiguity. Please feel free to contact us if you need any further assistance.

jkschanke · June 11, 2018, 8:19pm

Alright, I guess that makes sense, even if 1GB seems really high for a 13MB pdf.

Would you expect the memory used to be released as soon as the pdf document is disposed? If i manually call pdfDocument.Dispose, the memory still stays around 1GB. Using the Visual Studio profiler I can see the garbage collection has ran, but the process is still sitting at 1GB memory usage and not changing. It seems to me that there’s a memory leak relating to the TextFragmentAbsorber

Farhan.Raza · June 12, 2018, 6:26am

@jkschanke

Thank you for elaborating it further.

We have logged an investigation ticket with ID PDFNET-44864 in our issue management system. The ticket ID has been linked with this thread and we will notify you as soon as some significant progress is made in this regard.

We are sorry for the inconvenience.