I am evaluating the Aspose.PDF. I installed the latest version from nuget (20.8.0) and activated the temporary license.
I have to migrate a project that currently use an old version of 3Heights extract (year 2011-2012) and basically I have to read all pdf text objects even with many pages (10000+). I have to extract whole text objects with all their properties (position, font, text and so on).
string pathExe = Path.GetDirectoryName(System.Reflection.Assembly.GetEntryAssembly().Location);
FileStream streamLicense = new FileStream($"{pathExe}\Aspose.Pdf.lic", FileMode.Open);
License license = new Aspose.Pdf.License();
license.SetLicense(streamLicense);
Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePdf);
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
document.Pages.Accept(textFragmentAbsorber);
I got a “Out of memory error” on accept function
I’ve tried to elaborate a single page too with same result
Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePdf);
for (int pagina = 1; pagina <= document.Pages.Count; pagina++)
{
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
document.Pages[pagina].Accept(textFragmentAbsorber);
}
Please make sure to set the Debug/Release Configuration to x64 while processing larger files. In case issue still persists, please share your sample PDF with us so that we can test the scenario in our environment and address it accordingly.
take a look using the file i’ve send you in p.m… for the same pdf (about 10k pages) 3height extract took about one minute and half to set every page in a loop and read all text object (with no memory leaks)
We regret that the issue is not resolved yet. Please note that performance related issues are complex in nature and need significant amount of time to get resolved. We will surely inform you as soon as we have some news about ETA of the fix. We highly appreciate your comprehension in this matter.
We were able to reproduce the issue in our environment while using the sample application which you provided. Therefore, we have registered it as PDFNET-48726 in our issue management system. We will further look into its details and keep you informed with the status of its rectification. Please be patient and spare us some time.