I have am using Aspose.pdf version 126.96.36.199 on a Windows Server 2008 R2 machine to extract text from PDF. There is no Adobe Reader or any other type of PDF reader installed on the 2008 R2 machine.
The relevant code looks like this
using (Document pdfDocument = new Document(pathToPdf))
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
//get the extracted text
contents = textAbsorber.Text;
textAbsorber = null;
For the first 100 or so documents, text extraction is fast and I have a thread watching the extraction as well. If the extraction takes longer than 15 seconds per megabyte for the file, I stop the extraction and move on because it is most likely stuck.
So the maximum time I am willing to wait for extraction for a 2 megabyte file is 30 seconds.
This should be sufficient, but when extracting thousands of files, the process gets slower and slower until every single file is timing out.
Is there something wrong with the way I am extracting the text (in the code above) that is somehow leaking resources?
Any assistance or guidance is appreciated.