Exponential memory growth

dmauler1 · August 9, 2024, 7:06pm

Hello,

We’re seeing a strange issue when performing a match and replace for text with a particular pdf. What’s strange is we run hundreds of pdfs a day through this code but earlier this week we hit a particular pdf that will exhaust all heap space. Even if the heap is set to 58GB and the pdf itself is only around 80MB. We’ve manually reviewed the pdf in adobe acrobat pro and nothing stands out as off with this file.

I’ve tried chasing this issue with visualvm and from what I can see it looks like the memory exhaustion is being caused by a call to Page.accept() and TextFragementAbsorber.visit(). This is all happening in a loop and should be freeing resources for each page after it’s done scanning them.

I’m including with this ticket a source file that just includes the called methods. If more is needed please let me know.

High level code overview

LivetextHandler.handle() <— entry point
Page page : pagesList ← for loop
LiveTextHandler.searchAndReplace()
LiveTextHandler.getSearchResults(pageNumber, searchString)
TextFragmentAbsorber
Page.accept(textfragementabsorber)
HandlerHelpers.closeAndCleanupPage(page) ← helper method to clean up page to free resources
page.close()
page.freeMemory()

Library versions
aspose.barcode: 24.6
aspose.pdf: 23.1

aspose-visualvm.png (277.4 KB)

sample.java.zip (2.4 KB)

Thank you for your time.

dmauler1 · August 9, 2024, 7:30pm

I’m trying to upload the pdf but it’s too large, is there another way to upload it?

asad.ali · August 9, 2024, 9:58pm

@dmauler1

Instead of Java file, please try to share a minimal code sample along with the information of the text that you are trying to find and replace using Aspose.PDF. Also, you can upload your file to Dropbox or Google Drive and share the link with us.

PS: Please try to use 24.7 version of the API as well before sharing the requested information.

dmauler1 · August 12, 2024, 9:44pm

Thank you for taking the time to respond. It looks like the issue has something to do with a graphic embedded at the top of each page. Each graphic object was made up of 1000s of vector based patterns objects. Converting it to a raster image solvers them problem.

asad.ali · August 13, 2024, 3:21am

@dmauler1

It is nice to know that you were able to sort your issue out. Please feel free to create a new topic in case you need any kind of assistance.