Hi,
I’m experiencing a memory issue with the latest version of Aspose.PDF (v25.11) in C#.
Background:
When I use Aspose.PDF’s Accept(TextFragmentAbsorber visitor) method, I notice that memory is not being released and usage continually increases. This problem is even more pronounced when working with documents in Hebrew languages.
Below is my sample code:
int numberProcessing = 5;
string inputFileName = @"D:\\PDF32000_2008.pdf";
for(int i = 0; i < numberProcessing; i++)
{
string outputFileName = $"D:\\PDF32000_2008_sanitized_{i}.pdf";
using(Stream fileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
{
Document pdfDocument = new Document(fileStream);
PageCollection pages = pdfDocument.Pages;
var absorber = new TextFragmentAbsorber(
@"(((file://|file:///)(?:(?:[\w\-_\/\|\w\-\.,@?^=%&:\/~\+#\.\…]+)+)+)|((http://|ftp://|https://|www.|HTTP://|HTTPS://|FTP://|WWW.)([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:\/~\+#\.\(\)]*[\w\-\@?^=%&\/~\+#\.\…\)])?)|([a-zA-Z0-9._-]+@[a-zA-Z0-9-]*(?:\.[a-zA-Z0-9-]+)+))")
{
// Enable regular expression search
TextSearchOptions = new TextSearchOptions(true)
};
foreach(Page page in pages)
{
// Accept the absorber for the page
page.Accept(absorber);
foreach(TextFragment textFragment in absorber.TextFragments)
{
// To do
}
}
pdfDocument.Save(outputFileName);
}
int debug = 1;
// Stop here and observe memory consumption. It remains unreleased and keeps rising as you proceed with other files.
}
Here is my sample file for this test:
PDF32000_2008.zip (8.3 MB)
I have tried various methods to reduce memory usage, such as using Page.FreeMemory(); , Page.Dispose(); , and GC.Collect(); , but none of them made a difference.
Please help me check this issue and advice me the way to avoid it at the moment.