OCR does not read Text correctly

Hi,


I am currently evaluating Aspose.OCR dll to extract data from a jpeg image.
My requirement is to fetch data from a specified block (rectangle).
While trying to achieve the above mentioned requirement, i found the following glitches
1) Initially I let Aspose.OCR to detect the rectangles and then using those co-ordinates I extracted text from the image file (attached with this post)
The output is as follows -
Block: {X=128,Y=480,Width=288,Height=32}
Text: Bill NO I S1 524408
Block: {X=128,Y=560,Width=368,Height=48}
Text: MR. NISHANT MADAAN
Whereas the original text is Bill No : 31324408

2)After this I tried to extract data individually, by specifying co-ordinates (using MsPaint application)
The output I got is as follows -
Block: {X=257,Y=474,Width=170,Height=47}
Text: 31 524408
Block: {X=120,Y=562,Width=386,Height=54}
Text: MR. NISHANT MADMN

As per the highlighted text it is clear that OCR is not able to read the digits properly, also when manually providing co-ordinates the output changes as well.

Please let me know how can I tackle the above issues. Since I am going to use this OCR tool for data critical project its a necessity for me to get the correct values irrespective of manual or automatic block creation.

Thanks & Regards
Nayan Parekh
Websym Solutions Private Limited.
Hi Nayan,

Thank you for your inquiry.

Please note that we are able to reproduce the said issue at our end. The issue has been logged into our issue tracking system with ID OCR-34235. Our product team will further look into it. We will update you accordingly.

In the mean while,feel free to contact us in case you have further comments or queries.
I am using the following code to read text from Image but it is dead slow and it does not produce accurate results. So I have to use Microsoft Office Imaging library. But that is little bit reliable but it is also very slow. Being your client I am expecting the results from you not from Microsoft. I have to read 50000+ documents of 1 page at a time. I am stuck and I am looking for work around at the moment. ASPOSE OCR library does not show any kind of exception immediately. It keeps on trying to read the image and at the end outofmemory exception arises. Kindly help me and then I will request my client for a testimonial for you. As your testimonials section is missing OCR testimonials. If I am correct.
Image is attached. I want to read Roll Number accurately. Kindly look into this issue. :)

public static string GetImageText(string imagesFile)
{
//Instantiate the License class
Aspose.OCR.License license = new Aspose.OCR.License();

//Pass only the name of the license file embedded in the assembly
license.SetLicense("Aspose.OCR.lic");

//Initialize an instance of OcrEngine
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.Config.ProcessColoredBackground = true;
ocrEngine.Config.RemoveNonText = true;
//Set the Image property by loading the image from file path location or an instance of MemoryStream
ocrEngine.Image = ImageStream.FromFile(imagesFile);

//Process the image
if (ocrEngine.Process())
{
//Display the recognized text
return ocrEngine.Text.ToString();
}
return "";
}

Hi Nayan,

Thank you for your inquiry.

Please note that the image you have shared is a scanned OMR sheet. Performing OCR operation on an OMR sheet will definitely slow down the process. In order to read data from OMR sheet you need to use OMR Template Editor and OmrEngine class. For more information, please visit the following links:

Hope the above information helps. Feel free to contact us in case you have further comments or queries.

Hi,


According to your suggestion i tried to use OMR techniques to extract data from scanned image, following where the findings of mine

1) According to OMR documentation i created a (.amr) file attached along with this post for reference. I referred the section “Working with Template Images” and created the .amr file using the image i want to scan.
2) Next i used coding snippets mentioned under the title “Extract text from a Scanned image”. In order to get PointF and SizeF I used the same .amr file and added an element to get the required co-ordinates (Refer image file "SelectedScreenshot)
3) Using these co-ordinates i then tired to access the text. The code is as follows
TextOcrElement textElement = new TextOcrElement(“OCR Text”, new PointF(9.83f, 46.31f), new SizeF(33.87f, 6.88f));

4) But on the below line i am getting an Error "Unexpected exception occured"
OmrProcessingResult result = engine.ExtractData(new OmrImage[] { image });

Can you please help me tackle the above issue. Also let me know if I am on the right track and what would be the best practice to achieve the same.

Looking forward for your quick response.

Note - Both the files mentioned in this post are zipped under the name “OCRIssue” and is attached along with this post

Hi Nayan,

Please note that OMR Template Editor solution is for read/extract data from OMR sheet (Problematic Image.jpg). In order to read data from an image that is not an OMR sheet, you have to perform OCR operation on it. Further, we are working on the issue that you were facing while reading the image VodafoneBillImage1.jpg under ticket ID OCR-34235 of our issue tracking system. We will update you soon via this thread.

Hi Ikram,

I am facing same issues. I had attached my image as well. I have read roll number. But it is dead slow and most of the time it shows exceptions. Please fix it. and let me know.

Regards
Asad
Hi Asad,

You can perform OCR operation on the image “VodafoneBillImage1.jpg”. Following is the code snippet that reads the text block.
string file_path = @"G:\OMR\VodafoneBillImage1.jpg";

OcrEngine ocr_engine = new OcrEngine();
ocr_engine.ClearNotifies();
ocr_engine.Config.ClearRecognitionBlocks();
ocr_engine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(125, 569, 370, 34));
ocr_engine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(129, 479, 290, 34));
ocr_engine.Config.AdjustRotation = AdjustRotationMode.Disabled;
ocr_engine.Config.DetectTextRegions = false;
ocr_engine.Image = ImageStream.FromFile(file_path);
if (ocr_engine.Process())
{
Aspose.OCR.IRecognizedText text = ocr_engine.Text;
string temp = text.ToString();
Console.WriteLine(temp);
}

Hope the above information helps. Feel free to contact us in case you have further query or comments.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.