OutOfMemoryException with 55KB PDF, v20.7

tims-1 · August 17, 2020, 3:07am

I am trying to evaluate the Aspose.OCR .NET library to extract text from scanned PDF documents.
As a test, I chose a 1 page PDF containing a scanned single page that is 55KB in size.
Using this code:

var ocr = new AsposeOcr();
var image = ocr.RecognizeImage(@"path\File.pdf");

I get an OutOfMemoryException on the call to RecognizeImage().

Stack Trace:

at System.Drawing.SafeNativeMethods.Gdip.CheckStatus(Int32 status)
at System.Drawing.Image.FromFile(String filename, Boolean useEmbeddedColorManagement)
at System.Drawing.Image.FromFile(String filename)
at Aspose.OCR.AsposeOcr.(String )
at Aspose.OCR.AsposeOcr.RecognizeImage(String fullPath, Boolean detectAreas, Boolean autoSkew)
at Aspose.OCR.AsposeOcr.RecognizeImage(String fullPath)
at TestApp.Tests.AsposeTests.OcrPdf() in D:\Development\Local\LocalTesting\ConsoleApp.NetCore31\Tests\AsposeTests.cs:line 15
at TestApp.Tests.AsposeTests.Run() in D:\Development\Local\LocalTesting\ConsoleApp.NetCore31\Tests\AsposeTests.cs:line 9
at TestApp.Program.Main(String[] args) in D:\Development\Local\LocalTesting\ConsoleApp.NetCore31\Program.cs:line 9

Am I doing something wrong?
I can share the PDF I am using privately to debug the issue.

asad.ali · August 17, 2020, 6:17pm

@tims-1

You cannot use Aspose.OCR on PDF files as it only supports images. You can convert your PDF into Images and then perform OCR operation on generated images via same method that you tried earlier.