Question regarding unreleased memory even after saving and closing the document when use Accept(TextFragmentAbsorber visitor) function

dunghnguyen · November 21, 2025, 4:44am

Hi,
I’m experiencing a memory issue with the latest version of Aspose.PDF (v25.11) in C#.
Background:
When I use Aspose.PDF’s Accept(TextFragmentAbsorber visitor) method, I notice that memory is not being released and usage continually increases. This problem is even more pronounced when working with documents in Hebrew languages.
Below is my sample code:

    int numberProcessing = 5;
    string inputFileName = @"D:\\PDF32000_2008.pdf";
  
    for(int i = 0; i < numberProcessing; i++)
    {
        string outputFileName = $"D:\\PDF32000_2008_sanitized_{i}.pdf";
        using(Stream fileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
        {
            Document pdfDocument = new Document(fileStream);
            PageCollection pages = pdfDocument.Pages;
            var absorber = new TextFragmentAbsorber(
@"(((file://|file:///)(?:(?:[\w\-_\/\|\w\-\.,@?^=%&amp;:\/~\+#\.\…]+)+)+)|((http://|ftp://|https://|www.|HTTP://|HTTPS://|FTP://|WWW.)([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&amp;:\/~\+#\.\(\)]*[\w\-\@?^=%&amp;\/~\+#\.\…\)])?)|([a-zA-Z0-9._-]+@[a-zA-Z0-9-]*(?:\.[a-zA-Z0-9-]+)+))")
            {
                // Enable regular expression search
                TextSearchOptions = new TextSearchOptions(true)
            };
            foreach(Page page in pages)
            {
                // Accept the absorber for the page
                page.Accept(absorber);
                foreach(TextFragment textFragment in absorber.TextFragments)
                {
                    // To do
                }
            }
            pdfDocument.Save(outputFileName);
        }
        int debug = 1;
        // Stop here and observe memory consumption. It remains unreleased and keeps rising as you proceed with other files.
    }

Here is my sample file for this test:
PDF32000_2008.zip (8.3 MB)

I have tried various methods to reduce memory usage, such as using Page.FreeMemory(); , Page.Dispose(); , and GC.Collect(); , but none of them made a difference.
Please help me check this issue and advice me the way to avoid it at the moment.

ilyazhuykov · November 21, 2025, 6:59am

@dunghnguyen
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-61302

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Thank you for provided information, I investigated issue and it indeed seems to be present. I added a task for development team to investigate and fix this issue.
Just a small suggestion to improve your code a bit, when using Document it’s recommended to use it in following way:

using (var pdfDocument = new Document(fileStream))

it doesn’t help much in your case, unfortunately
You’ll be notified in case of any update regarding this issue.

dunghnguyen · November 21, 2025, 9:20am

Thank you for acknowledging the issue.
We’ll wait for a fix, as it seems there is currently no workaround available.