Out of memory with TextFragmentAbsorber (.net)

schesneau · September 20, 2017, 10:37am

Hi,

I have a PDF with 4500 pages and a collection of arround 100 words. I need to seek each word (without regex) on each pdf page, and if found, I create a link on the word (with LinkAnnotation and TextFragment rectangle).

I try to different ways and the last one :

A) Open PDF
B) Loop on 4500 pages
C) Foreach page, loop on 100 words
D) Foreach word, Accept TextFragmentAbsorber for currentPage
E) If result in TextFragments, create linkAnnotation.
F) Close PDF after loop on page

The only way to do this is to close my PDF every 1000 pages. Memory is released but this way is very slow.

It seem there is no way to free memory after assign TextFragmentAbsorber to current page or whole file and after few loop, my application crash with an out of memory.

Is there a way to released memory while processing ?

schesneau · September 20, 2017, 1:19pm

I try other way with same issue “System.OutOfMemoryException” :

A) Open PDF
B) Loop on 100 words
C) Foreach word, Accept TextFragmentAbsorber for all pdf page -> run out of memory for the first word.

asad.ali · September 20, 2017, 5:53pm

@schesneau

Thanks for contacting support.

Would you please share your sample PDF document, along with the code snippet, which you are executing at your end? We will test the scenario in our environment and address it accordingly. However, please note that our forum support upload size up-to 3.0MB and in case your document is of larger size, you may upload it to some public file sharer (i.e Dropbox, Google Drive, etc.) and share the link here.