Hi,
We are trying to read text from a large PDF and we are observing high memory consumption which leads into system going outofmemory.
We use below code for text extraction page by page
public void PerformanceTest()
{
var filelicense = File.OpenRead("D:\\AsposeTotalNET.lic");
License license = new License();
license.SetLicense(filelicense);
List<string> pageTextList = new List<string>();
using (var file = File.OpenRead("D:\\large 30k.pdf"))
{
using (Aspose.Pdf.Document doc = new Aspose.Pdf.Document(file))
{
foreach (Page page in doc.Pages)
{
TextAbsorber textAbsorber = new TextAbsorber();
textAbsorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw);
page.Accept(textAbsorber);
pageTextList.Add(textAbsorber.Text);
}
}
}
}
Aspose memory issue.png (26.4 KB)
The Memory profiler shows huge 4.6 GB of memory consumption for reading texts from all the pages and it varies for every run.
Please find the sample document link
We have tried with the version like 23.12.0 and latest 24.6.0 and observed the same behavior.