Scanned PDF OCR

Im very interested in the total package. But 1 key to our application is being able to OCR scanned PDF.

How can I OCR scanned PDFs?

Thanks

JOE

josefbaker:

Im very interested in the total package. But 1 key to our application is being able to OCR scanned PDF.

How can I OCR scanned PDFs?

Thanks

JOE

I am also looking for the same.

Thanks
Himanshu

Hi there,


Thanks for your inquiry. Aspose.OCR supports JPEG, PNG, TIF, BMP
and GIF image formats with
Arial, Times New Roman and Tohama fonts. It’s character recognition accuracy of big font sizes i.e. 32pts and above is 90% and smaller font sizes have less accuracy.

You can convert Pdf document to image with the help of Aspose.Pdf and later can OCR resulting Image with Aspose.OCR. You can also extract text from existing Pdf document using Aspose.Pdf.

Please feel free to contact us for any further assistance.

Best Regards,

I am looking at your “Total Package” but im finding OCR to be the weak link.

I went ahead and tried converting my scanned PDF to many different image formats then applied the image to OCR. It takes for ever to run under every format. Im running win8 and a brand new machine. Is this common? This isn’t going to work for our customers if it is.

Private Function OCRImage(Input As String) As String
' Resource file
Const resourceFileName As String = "Aspose.OCR.Resources.zip"
' Source file: the file on which OCR will be performed
Dim imageFile As String = Input

' Initialize OcrEngine
Dim ocr As OcrEngine = New OcrEngine()
' Set the image
ocr.Image = ImageStream.FromFile(imageFile)

' Add language
ocr.Languages.AddLanguage(Load("english"))
ocr.Config.UseDefaultDictionaries = True

' Load the resource file
Dim fileStream As New FileStream(resourceFileName, FileMode.Open)
ocr.Resource = fileStream

Try
' Process the whole image
If ocr.Process() Then
' Get the complete recognized text found from the image
'Console.WriteLine("Text recognized./n" & ocr.Text)
Return ocr.Text.ToString
End If

Catch ex As Exception
MsgBox("Exception: " & ex.Message)
Return ""
End Try

Return ""
End Function Private Function OCRImage(Input As String) As String
' Resource file
Const resourceFileName As String = "Aspose.OCR.Resources.zip"
' Source file: the file on which OCR will be performed
Dim imageFile As String = Input

' Initialize OcrEngine
Dim ocr As OcrEngine = New OcrEngine()
' Set the image
Dim fmt As ImageStreamFormat
fmt = ImageStream.FromFile(imageFile).Format
ocr.Image = ImageStream.FromFile(imageFile)

' Add language
ocr.Languages.AddLanguage(Load("english"))
ocr.Config.UseDefaultDictionaries = True

' Load the resource file
Dim fileStream As New FileStream(resourceFileName, FileMode.Open)
ocr.Resource = fileStream

Try
' Process the whole image
If ocr.Process() Then
' Get the complete recognized text found from the image
'Console.WriteLine("Text recognized./n" & ocr.Text)
Return ocr.Text.ToString
End If

Catch ex As Exception
MsgBox("Exception: " & ex.Message)
Return ""
End Try

Return ""
End Function

Looking deeper, its not returning any data except a header.

Attached is a sample file that im trying to OCR.

If I can convert this scanned pdf to text im sold on the "total" package.

Please get back to me quickly as I have a timeline for this project.

Thanks

JOE

Thanks

Hi Josef,


Thanks for your feedback.I'm afraid Aspose.OCR is not recognizing your source document, as currently its having good accuracy with bigger fonts. Aspose.OCR is still an early stage product and doesn't quite meet the expectations. As OCR technology is a very complex area, however our development team is working hard to revamp the product to improve the performance and capabilities of the product.

Sorry for the inconvenience faced.

Best Regards,