OCR text output is gibberish

winterix · June 19, 2018, 5:01pm

I am testing Aspose.OCR under a temp evaluation license, for use in an internal tool designed to extract text from documents and images and search for keywords. I’m getting bizarre and unusable results in basic testing.

I created several test images (just random poems) using the fonts and file formats your product states are supported. For example, in one image it starts:
“The Buzzards
When evening came and the warm glow grew deeper…”

And the processed result is:
“T`hr Bdzzdrdc
Whcc ccccccag ccccc cad chc cccaa glccc gcccc dccacc…”

My C# code uses the basic sample from documentation:
ocr.Image = ImageStream.FromFile(file);
if (ocr.Process()) { text = ocr.Text.ToString(); }

I don’t know if there is some object configuration that would do the trick, but I ran all my samples through the free online OCR (www.onlineocr.net) and the results were nearly perfect. Unless there is a fairly straightforward fix we cannot consider Aspose as a candidate to procure for our project.ocr-buzzards.jpg (165.8 KB)

ikram.haq · June 19, 2018, 6:25pm

@winterix,

Thank you for sharing details. This is to update you that downloadable version of Aspose.OCR had sever performance issues and was not meeting the customers’ expectations. Development process is slower because of such issues and less new features/fixes in the process.

Aspose.Cloud servers have enough resources to overcome such performance issues and we are targeting those servers for further development. Downloadable version of Aspose.OCR will not be able to meet customers’ expectations in the near future and it is expected to be discontinued soon. We recommend migrating to cloud version of Aspose.OCR to get better performance and get quick fixes to the issues mentioned by you. You can try Aspose.OCR for Cloud. You may post your inquiry on OCR for Cloud support forum for further assistance.

We are sorry for the inconvenience.