Out of memory when accept with TextFragmentAbsorber

Hi

I am evaluating the Aspose.PDF. I installed the latest version from nuget (20.8.0) and activated the temporary license.
I have to migrate a project that currently use an old version of 3Heights extract (year 2011-2012) and basically I have to read all pdf text objects even with many pages (10000+). I have to extract whole text objects with all their properties (position, font, text and so on).

string pathExe = Path.GetDirectoryName(System.Reflection.Assembly.GetEntryAssembly().Location);
FileStream streamLicense = new FileStream($"{pathExe}\Aspose.Pdf.lic", FileMode.Open);
License license = new Aspose.Pdf.License();
license.SetLicense(streamLicense);

Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePdf);
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
document.Pages.Accept(textFragmentAbsorber);

I got a “Out of memory error” on accept function

I’ve tried to elaborate a single page too with same result

Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePdf);
for (int pagina = 1; pagina <= document.Pages.Count; pagina++)
{
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
document.Pages[pagina].Accept(textFragmentAbsorber);
}

any tips?

@g.beda

Please make sure to set the Debug/Release Configuration to x64 while processing larger files. In case issue still persists, please share your sample PDF with us so that we can test the scenario in our environment and address it accordingly.

take a look using the file i’ve send you in p.m… for the same pdf (about 10k pages) 3height extract took about one minute and half to set every page in a loop and read all text object (with no memory leaks)

In case it is related, I reported a very similar problem last year which has been confirmed but still no progress:

@g.beda

We have checked your message and there was no link to the file. Would you please make sure to send it again.

@ast3

We regret that the issue is not resolved yet. Please note that performance related issues are complex in nature and need significant amount of time to get resolved. We will surely inform you as soon as we have some news about ETA of the fix. We highly appreciate your comprehension in this matter.

We are sorry for the inconvenience.

@asad.ali

sent you a new p.m.

@g.beda

We were able to reproduce the issue in our environment while using the sample application which you provided. Therefore, we have registered it as PDFNET-48726 in our issue management system. We will further look into its details and keep you informed with the status of its rectification. Please be patient and spare us some time.

We are sorry for your inconvenience.