System.OutOfMemoryException while extracting text from pdf

huseyincandan · February 17, 2015, 1:22am

Hi,

Extracting text from pdf files uses too much memory. My code is like this:

Aspose.Pdf.Document doc;

Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();

using (doc = new Aspose.Pdf.Document(fileName))

doc.Pages.Accept(textAbsorber);

string text = textAbsorber.Text;

doc.Dispose();

I’ll extract text of pages seperately so will call this function -doc.Pages[x].Accept(textAbsorber- for every page and because textAbsorber is not Disposable I’ll need to call GC.Collect() every time.

Do you have any solution/suggestion for my case?

Some files give OutOfMemoryException. You may download one example file from following link http://www.filedropper.com/mukayeseraporu

tilal.ahmad · February 17, 2015, 9:58pm

Hi Huseyin,

huseyincandan:

I'll extract text of pages seperately so will call this function -doc.Pages[x].Accept(textAbsorber- for every page and because textAbsorber is not Disposable I'll need to call GC.Collect() every time.

Do you have any solution/suggestion for my case?

Thanks for your inquiry. We have already noticed the resources issue with TextAbsorber and logged a ticket PDFNEWNET-35329 in our issue tracking system to fix it. We have linked your issue to the issue id and will update you as soon as it is resolved.

huseyincandan:

Some files give OutOfMemoryException. You may download one example file from following link http://www.filedropper.com/mukayeseraporu

Thanks for sharing a sample document. We have tested the scenario and noticed OutofMemory exception, so logged a ticket PDFNEWNET-38240 in our issue tracking system for further investigation and resolution. We will keep you updated about the issue resolution progress.

We are sorry for the inconvenience caused.

Best Regards,

huseyincandan · February 26, 2015, 10:41am

Hi,

Is there any good news about these issues? We will process many big documents in paralllel. Memory usage and performance are very critical for us.

We need to determine the tool(s) we will use in the project in a few days. I’ve tried some pdf tools having better values (with huge amount of difference), but Aspose has many advantages such as processing many types of documents, good support, etc.

I’m very close to finish my project (wrote many lines of codes, using your tool). So, please help me to overcome my issues. You may inform me about approx. time of resolving above issues. Maybe, my code is wrong, or there is a better alternative way of extracting text.

Thank you,

Best regards.

tilal.ahmad · February 26, 2015, 12:00pm

Hi Huseyin,

Thanks for your feedback. I am afraid your reported issue is still pending for investigation as we have recently noticed the issue. However we have passed your concerns to our development team and raised issue priority as well. We will notify you as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience caused.

Best Regards,

aspose.notifier · May 7, 2016, 3:19pm

The issues you have found earlier (filed as PDFNEWNET-38240) have been fixed in Aspose.Pdf for .NET 11.6.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.