"System.OutOfMemoryException" when extracting text from Large PDF

Dear Team,


I’m evaluating your component related to my requirements.

When extracting the content of PDF file which has more than 1800 pages (PDF size --> 40 MB), the system throws "System.OutOfMemoryException"

Below is the code snippet i have used to extract the PDF file.

Document pdfDocument = new Document(“input.pdf”);

TextAbsorber textAbsorber = new TextAbsorber();

pdfDocument.Pages.Accept(textAbsorber);

string extractedText = textAbsorber.Text;


Please advise me if anything wrong here.

Update on 2nd July 2013:

Same error "System.OutOfMemoryException" when trying to extract a text from PDF file with 366 pages (PDF size --> 5 MB) using multiple threads (Minumum 3 times). Attached the file for your reference

for (int i = 0; i < 3; i++)
{
new Thread(new ParameterizedThreadStart(ReadPDFContentUsingAspose)).Start(filepath));
}

Please let me know if any restrictions on the PDF size, no of pages it can extract and no of files it can process at a time.

Dear Aspose Team,


Your response to the above query will be greatly appreciated.

Hi Senthil,

I’m very sorry due to the delay in the reply. Usually all the queries are replied within 24 hours; however, this got slipped through somehow and delayed. A support team member will get in touch with you shortly with an appropriate answer for your issue and query.

We’re very sorry for the inconvenience.
Regards,

Hi Senthil,


Thanks for using our products and sorry for the delayed response.

I have tested the scenario using following code snippet with Aspose.Pdf for .NET 8.2.0 in Visual Studio 2010 application running over Windows 7 (X64) and I am unable to notice any problem. For your reference, I have also attached the resultant PDF generated over my end. Can you please try using the latest release version and in case the problem still persists, please share some details regarding your working environment. We are really sorry for the delay and inconvenience.

Hi Senthil,

Adding more to my previous comments, please share the source 40MB file so that we can test the scenario at our end.

Same error “System.OutOfMemoryException” when trying to extract a text from PDF file with 366 pages (PDF size → 5 MB) using multiple threads (Minumum 3 times). Attached the file for your reference
for (int i = 0; i < 3; i++)
{
new Thread(new ParameterizedThreadStart(ReadPDFContentUsingAspose)).Start(filepath));
}

Aspose.Pdf for .NET is supported over multi-threaded environment but currently you cannot access single PDF file in multiple threads.

Please let me know if any restrictions on the PDF size, no of pages it can extract and no of files it can process at a time.

There is no specific restriction/limitations regarding the source/input PDF file. However if you are using the component in trial mode, you will encounter limitations in some features (4 attachments, annotations, form fields etc) can be manipulated in trial mode). Therefore in order to remove such restrictions, you may consider requesting a 30 days temporary license to test our component without any limitations. For further details, please visit Get a temporary license.

Hi Nayyer,


Thanks for the details.

Currently i’m using Aspose.Pdf for .NET 8.1.0. I will download the latest and check the issue again.

As suggested by you, i will try loading with different set of files instead of using the same file in multiple threads.

Hi Nayyer,


Still i’m getting out of memory exception when extract the text from 40 MB file. I tried to upload the file here but timeout error was thrown.

Please let me know how to upload the file.

Note:
I have used the “Wrox.Professional.CSharp.4.and.NET.4.Mar.2010.pdf” for testing purpose.

Shared the Dropbox link which contains the 40MB file.

Hi there,

Sorry for the inconvenience faced. While using the latest version of Aspose.Pdf for .NET 8.2.0, I’ve managed to reproduce this issue on my side and logged the issue in our bug tracking system as PDFNEWNET-35525 for further investigation and resolution. I’ve also linked your request to this issue and you will be notified via this thread as soon as it is resolved.

Please feel free to contact us for any further assistance.

Best Regards,

Thanks for logging the issue in your bug tracking system.

Any updates for the above issue ?

Hi Senthil,


Thanks for your inquiry. I’m afraid, your reported issue is still not resolved. Its pending for analysis in the queue with other priority tasks. We will update you regarding ETA as soon as its investigation completes.

Thanks for your patience and cooperation.

Best Regards,

Hi Senthil,


Thanks for your patience.

The development team started to investigate the problem reported earlier as PDFNEWNET-35525 but we have come to know that the resource file shared over this link (which you have shared earlier) does not seem to be present. Either the file has been removed or link has been updated. Can you please once again try sharing the document so that we can get the copy and start our investigations related to this issue. We are sorry for this inconvenience.

Hi Nayyer,


You can download the file from same link now. Please let me know if you face any issues in downloading the file.

https://www.dropbox.com/s/xrbskhuptzbhdkz/Wrox.Professional.CSharp.4.and.NET.4.Mar.2010.rar

Hi there,


Thank you for sharing source document. We’ve downloaded the document successfully. Our development team has completed the investigation and planned its fix in upcoming release of Aspose.Pdf for .NET 8.4.0, that will be published in start of September, 2013. However we will update you as soon as its published and gets available for download.

Thanks for your patience and cooperation.

Best Regards,

Hi Senthil,


Thanks for sharing the resource file.

Adding more to Tillal’s comments, the development team has started investigating this problem and we plan to get it resolved in next release of Aspose.Pdf for .NET 8.4.0 but still it’s not a promise. And in case we do not encounter any show stopper, we will consider fixing this problem by the said time. Your patience and comprehension is greatly appreciated in this regard.

Thanks Tilal and Nayyer for the details.


We will be happy if we get in the earlier release.

The issues you have reported earlier (filed as PDFNEWNET-35525) have been fixed in Aspose.Pdf for .NET 8.4.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.