I’m doing some test with our Company Forms but I get only illegible strings. I have tried with different file formats, DPIs, color and B/W, etc… but I could’ get more than a few character properly identified.
Then I have created a test image using paint writing some string in Times New Roman (the font we use in out forms) and Arial. Only Arial text is properly extracted.
Attached you can find the image I’m using for testing and this is the text I get : “M tTRF TiA47R mW RahqAM m ntm FanhoRnFADFRS ARF mztTTFN m R1-nnRF.An m JMRF.RS r-lKF. 45345
(JR l 231 X444
THSSARAL18 REFERENCE584697412A NAME lw ARIAL 98”
Must I programmatically select the font I’m going to read before process the image?
Thank you
Here you have the code I use for testing:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim dataDir As String = Path.GetFullPath("../../../Data/")
' Resource file
Const resourceFileName As String = "C:\Desarrollo\TFS_ONLINE\OCRFomularios\Aspose.OCR.Resources.zip"
' Source file: the file on which OCR will be performed
Dim imageFile As String = "P:\FORMS\FONTEST.JPG"
Dim license As Aspose.OCR.License = New Aspose.OCR.License()
license.SetLicense("C:\Desarrollo\TFS_ONLINE\OCRFomularios\Aspose.OCR.lic")
' Initialize OcrEngine
Dim ocr As New OcrEngine()
' Set the image
ocr.Image = ImageStream.FromFile(imageFile)
' Add language
ocr.Languages.AddLanguage(Language.Load("english"))
' ocr.Languages.AddLanguage(Language.Load("spanish"))
' Load the resource file
ocr.Resource = New FileStream(resourceFileName, FileMode.Open)
Try
' Process the whole image
If ocr.Process() Then
' Get the complete recognized text found from the image
Console.WriteLine("Text recognized: " & ocr.Text.ToString())
File.WriteAllText(dataDir & "P:\FORMS\Output.txt", CType(ocr.Text, Object).ToString())
End If
Catch ex As Exception
Console.WriteLine("Exception: " & ex.ToString())
End Try
End Sub