Hi,
Extracting text from pdf files uses too much memory. My code is like this:
Aspose.Pdf.Document doc;
Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();
using (doc = new Aspose.Pdf.Document(fileName))
doc.Pages.Accept(textAbsorber);
string text = textAbsorber.Text;
doc.Dispose();
I’ll extract text of pages seperately so will call this function -doc.Pages[x].Accept(textAbsorber- for every page and because textAbsorber is not Disposable I’ll need to call GC.Collect() every time.
Do you have any solution/suggestion for my case?
Some files give OutOfMemoryException. You may download one example file from following link http://www.filedropper.com/mukayeseraporu
Hi,
Is there any good news about these issues? We will process many big documents in paralllel. Memory usage and performance are very critical for us.
We need to determine the tool(s) we will use in the project in a few days. I’ve tried some pdf tools having better values (with huge amount of difference), but Aspose has many advantages such as processing many types of documents, good support, etc.
I’m very close to finish my project (wrote many lines of codes, using your tool). So, please help me to overcome my issues. You may inform me about approx. time of resolving above issues. Maybe, my code is wrong, or there is a better alternative way of extracting text.
Thank you,
Best regards.
Hi Huseyin,
Thanks for your feedback. I am afraid your reported issue is still pending for investigation as we have recently noticed the issue. However we have passed your concerns to our development team and raised issue priority as well. We will notify you as soon as we made some significant progress towards issue resolution.
We are sorry for the inconvenience caused.
Best Regards,
The issues you have found earlier (filed as PDFNEWNET-38240) have been fixed in Aspose.Pdf for .NET 11.6.0.
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.