Excessive amount of memory used or Out Of Memory Exception thrown when Extracting text from large PDFs

While extracting Text from large pdf files (around 18000 pages and 22MB), over 3GB of memory gets used or an Out Of Memory Exception is thrown.

My code looks like this:

var options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.MemorySaving);  
var fragmentAbsorber = new TextFragmentAbsorber(@"\{\{(.*?)\}\}", new TextSearchOptions(true));  
fragmentAbsorber.ExtractionOptions = options;  
var foundTags = new List<string>();  
foreach (var page in pdf.Pages)  
{  
     page.Accept(fragmentAbsorber);  
     var matches = fragmentAbsorber.TextFragments;  
     if (matches.Count == 0) continue;  
     var tag = matches.FirstOrDefault()?.Text;  
     if (!foundTags.Contains(tag))  
          foundTags.Add(tag);
     fragmentAbsorber.Reset();
     page.FreeMemory();  
     matches.Clear();                   
}

@kwright

Thank you for contacting support.

Would you please share the PDF document via Google Drive, Dropbox etc. so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using Aspose.PDF for .NET 19.8.

Good morning, thank you for your quick response on this issue.
The actual PDF that the issue happens with contains sensitive data so I have created a sample PDF that is similar enough and the problem still persists. I have also verified that I am using Aspose.PDF for .NET 19.8. The link to this file is:

https://diversifiedco-my.sharepoint.com/:b:/g/personal/kwright_divcodata_com/EVux_CSIRsFPlDWO59FG9a0BDjv2N8Ue0LX-HuGQjeBIKA?e=sC2bIs

I look forward to your response.

@kwright

Thank you for sharing requested data.

We have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFNET-46901 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.