Poor accuracy

Hello,

I downloaded the trial version of Aspose OCR and I did several basic tests and this OCR looks to have really bad accuracy. I only managed to make it work with an image where it was written in very big font Hello World.
I attached a really easy sample and the detection is just unusable !

Am I missing something in the Aspose configuration or is it the best I can have from this OCR ?

This kind of occuracy is just not workable for us.

Thanks for any suggestion about this.

Regards
Hi,

Sorry to hear about you trouble.

We have investigated the mentioned issue while using the sample image provided by you. We have used the latest version of Aspose.OCR for .Net 2.8.0. While testing it was found that the image provided by you has very low DPI value i.e. 96. Please note that the current implementation of the Aspose.OCR APIs perform well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution. Your provided image has resolution of 96 DPI therefore it will not be possible to get 100% accuracy.

Further, it was found that the scanned image contains writing in French language. Languages other than English require loading the language specific resources. Please check the detailed article on Working with Different Languages. Following is the sample code that we used to perform OCR on the image provided by you.


//Initialize an instance of OcrEngine
OcrEngine ocrEngine = new OcrEngine();
//Set the Image property by loading the image from file path location or an instance of Stream
ocrEngine.Image = ImageStream.FromFile(@"D:\testocr4.jpg");
//Clear the default language (English)
ocrEngine.LanguageContainer.Clear();
//Load the resources of the language from file path location or an instance of Stream
ocrEngine.LanguageContainer.AddLanguage(LanguageFactory.Load(@"D:\Aspose.OCR.French.Resources.zip"));
//Process the image
if (ocrEngine.Process())
{
//Display the recognized text
Console.WriteLine(ocrEngine.Text);
}

Output

mous roprondrons Ims roumions hondamadairo ànanir am la somaxm@ nrocharmo Im tomps do taxl@ un
noxmi sur I'minot sur Ims techos aummil.
Au bosorm, io rosto brem sor disponible lusque là.


Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.

Hi,


Can I hd the same problem in JAVA too. can i get any solution

Hi Himaja,


As discussed earlier in this thread, the current implementation of Aspose.OCR APIs can work well with images having high resolution (preferably 300 or more). The accuracy rate drops by decreasing the resolution (DPI) of the scanned documents, where the behavior is similar for both .NET & Java APIs.

Please try the Aspose.OCR for Java API against high resolution images, and feel free to contact us back in case you face any difficulty.

but how can we exract text for diffrnt fonts?

Hi Moon,


Aspose.OCR APIs currently support Arial, Times New Roman, Courier New, Tahoma, Calibri and Verdana in Regular, Bold and Italic styles. You do not need to set extra configurations while recognizing the supported font types, however, the image must be in high resolution for optimal accuracy.

The reply below is not that old. We have been having accuracy issues when trying to OCR a reference number on fax files that are at a smaller dpi and font size. We are using 3.4 or 3.5 .OCR .Net. I saw previous posts about needing font size 32pt to only get to 90% but those posts are from 2013. That would not be acceptable.
Any best font and/or font size recommendations?
It is just to read a 13 character reference number.
Thanks,
Neil

ikram.haq:
Hi,

Sorry to hear about you trouble.

We have investigated the mentioned issue while using the sample image provided by you. We have used the latest version of Aspose.OCR for .Net 2.8.0. While testing it was found that the image provided by you has very low DPI value i.e. 96. Please note that the current implementation of the Aspose.OCR APIs perform well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution. Your provided image has resolution of 96 DPI therefore it will not be possible to get 100% accuracy.

Further, it was found that the scanned image contains writing in French language. Languages other than English require loading the language specific resources. Please check the detailed article on Working with Different Languages. Following is the sample code that we used to perform OCR on the image provided by you.


//Initialize an instance of OcrEngine
OcrEngine ocrEngine = new OcrEngine();
//Set the Image property by loading the image from file path location or an instance of Stream
ocrEngine.Image = ImageStream.FromFile(@"D:\testocr4.jpg");
//Clear the default language (English)
ocrEngine.LanguageContainer.Clear();
//Load the resources of the language from file path location or an instance of Stream
ocrEngine.LanguageContainer.AddLanguage(LanguageFactory.Load(@"D:\Aspose.OCR.French.Resources.zip"));
//Process the image
if (ocrEngine.Process())
{
//Display the recognized text
Console.WriteLine(ocrEngine.Text);
}

Output

mous roprondrons Ims roumions hondamadairo ànanir am la somaxm@ nrocharmo Im tomps do taxl@ un
noxmi sur I'minot sur Ims techos aummil.
Au bosorm, io rosto brem sor disponible lusque là.


Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.
Hi Neil,

Thank you for your inquiry.

Please forward us the sample image that you are trying at your end. We will test it at our end and will update you about our findings. In the meanwhile you may try the details about font shared in the previous post. Furthermore font size of 12pt or 14pt can be used.