Poor Performance and Recognition

sanjit.roopra · January 14, 2015, 11:02am

Hi,

We are currently evaluating OCR libraries and from an API perspective Aspose did a really good job.
But we have troubles on our samples with speed and recognition of characters.

Initally the data has white characters and black background. With this actually nothing was recognized. We had to invert the images to get a white background and black characters. That helped a bit. Then we increased the size of the image and the dpi which also helped until we have similar results as with other OCR libraries.

But speed is really bad. 6 seconds for the image when it has the resolution 2040x324. Other libs do the same in < 1 second.
It is a Java library problem?
Also are there parameters to tweak here a little bit?

Currently I have the following setup:
ILanguage language = Language.load( “english” );
ocr.getLanguages().addLanguage( language );
ocr.getConfig().setDetectTextRegions(true);
ocr.getConfig().setRemoveNonText( true );
ocr.getConfig().setDetectReadingOrder( false );

The image does not contain meaningful sentences but rather names and numbers. I know you will ask for a sample but the current sample contains patient information which I am not allowed to put on the web. If really needed I will need to find another sample without sensitive data but maybe you know already about that or a similar issue.

Thanks.

babar.raza · January 15, 2015, 12:01am

Hi Sanjit,

Thank you for considering Aspose APIs, and welcome to Aspose.OCR support forum.

First of all, in cases where performance has to be evaluated, we certainly need a sample from your end so we may cross-reference your ratings with ours and try to tweak the configurations in order to get better results. You may send us the sample image via a private email. Please use the Contact button on the post window to send an email to babar.raza and attach your sample to it.

Please note, Aspose.OCR for .NET is two versions ahead of Java release therefore the .NET version of the API is better in every way at the moment. Moreover, .NET version of the API has stopped using the resource archive for the OCR process because the archive has been embedded in the assembly it self. Due to this reason, .NET version of the API would be much better in the performance as it does not have to load and decompress the resources. The aforesaid changes will be available in Java version of the API with its next release, that is scheduled in the last quarter of February 2015.

Regarding the possible tweaks, please check the detailed articles on different OcrEngine configurations and their effects on the OCR process. Among many configurations, the Detect Reading Order can cause major performance degradation if set to true where the source image has many textual blocks on the image. As you have already set this property to false therefore the performance degradation may not be due to this property.