General question about Aspose.OCR

mrossi · June 23, 2015, 4:56am

Hello,

according to the overview page Optical Character Recognition API for Java | products.aspose.com it seems that exactly 6 fonts are supported.

What does this mean technically?
What about OCR on documents that use other fonts? Will OCR still work but with a higher error rate or will OCR simply not work at all?
Are there some figures about error rates with supported and without supported fonts?

Thanks
Mario

babar.raza · June 23, 2015, 8:52am

Hi Mario,

Thank you for considering Aspose APIs.

That is correct, Aspose.OCR APIs currently support 6 font types, that are; Arial, Times New Roman, Courier New, Tahoma, Calibri and Verdana. In case an image has text in any font other than the aforesaid fonts but resemble in style then it is possible to get the same accuracy rate as of the supported font types. In case the text has a font that isn’t from the supported set and does not resemble in style with any of the supported fonts then the OcrEngine will return the garbage data. Unfortunately, we do not have any statistics on accuracy rate of supported/unsupported font types however, you may give the API a try on your side with your real life samples. Please download the latest version of the Aspose.OCR for Java 2.5.0 whereas the source code snippet to perform OCR operation can be acquired from Github or Programmer’s guide.

mrossi · June 24, 2015, 7:37am

Hi Babar,

this is what I imagined.

It’s quite curious that you don’t have accuracy rates measures. I would expect this to be an important quality indicator for an OCR software, in particular if it supports only quite a limited number of fonts and languages.

Anyway, thanks for your reply
Mario

babar.raza · June 24, 2015, 10:38am

Hi Mario,

Thank you for the suggestion. We will surely consider to add some statistics to our documentation for supported & unsupported font types.