Im very interested in the total package. But 1 key to our application is being able to OCR scanned PDF.
How can I OCR scanned PDFs?
Thanks
JOE
Im very interested in the total package. But 1 key to our application is being able to OCR scanned PDF.
How can I OCR scanned PDFs?
Thanks
JOE
josefbaker:Im very interested in the total package. But 1 key to our application is being able to OCR scanned PDF.
How can I OCR scanned PDFs?
Thanks
JOE
I am also looking for the same.
Thanks
Himanshu
Hi there,
I am looking at your “Total Package” but im finding OCR to be the weak link.
I went ahead and tried converting my scanned PDF to many different image formats then applied the image to OCR. It takes for ever to run under every format. Im running win8 and a brand new machine. Is this common? This isn’t going to work for our customers if it is.
Private Function OCRImage(Input As String) As String
' Resource file
Const resourceFileName As String = "Aspose.OCR.Resources.zip"
' Source file: the file on which OCR will be performed
Dim imageFile As String = Input
' Initialize OcrEngine
Dim ocr As OcrEngine = New OcrEngine()
' Set the image
ocr.Image = ImageStream.FromFile(imageFile)
' Add language
ocr.Languages.AddLanguage(Load("english"))
ocr.Config.UseDefaultDictionaries = True
' Load the resource file
Dim fileStream As New FileStream(resourceFileName, FileMode.Open)
ocr.Resource = fileStream
Try
' Process the whole image
If ocr.Process() Then
' Get the complete recognized text found from the image
'Console.WriteLine("Text recognized./n" & ocr.Text)
Return ocr.Text.ToString
End If
Catch ex As Exception
MsgBox("Exception: " & ex.Message)
Return ""
End Try
Return ""
End Function Private Function OCRImage(Input As String) As String
' Resource file
Const resourceFileName As String = "Aspose.OCR.Resources.zip"
' Source file: the file on which OCR will be performed
Dim imageFile As String = Input
' Initialize OcrEngine
Dim ocr As OcrEngine = New OcrEngine()
' Set the image
Dim fmt As ImageStreamFormat
fmt = ImageStream.FromFile(imageFile).Format
ocr.Image = ImageStream.FromFile(imageFile)
' Add language
ocr.Languages.AddLanguage(Load("english"))
ocr.Config.UseDefaultDictionaries = True
' Load the resource file
Dim fileStream As New FileStream(resourceFileName, FileMode.Open)
ocr.Resource = fileStream
Try
' Process the whole image
If ocr.Process() Then
' Get the complete recognized text found from the image
'Console.WriteLine("Text recognized./n" & ocr.Text)
Return ocr.Text.ToString
End If
Catch ex As Exception
MsgBox("Exception: " & ex.Message)
Return ""
End Try
Return ""
End Function
Looking deeper, its not returning any data except a header.
Attached is a sample file that im trying to OCR.
If I can convert this scanned pdf to text im sold on the "total" package.
Please get back to me quickly as I have a timeline for this project.
Thanks
JOE
Thanks
Hi Josef,
Thanks for your feedback.I'm afraid Aspose.OCR is not recognizing your source document, as currently its having good accuracy with bigger fonts. Aspose.OCR is still an early stage product and doesn't quite meet the expectations. As OCR technology is a very complex area, however our development team is working hard to revamp the product to improve the performance and capabilities of the product.
Sorry for the inconvenience faced.
Best Regards,