OCR Testing - Not recognizing text

sbosell · October 3, 2016, 10:33am

We are evaluating using aspose ocr in a project so I am doing some tests to evaluate it. My test consists of doing OCR on a newspaper jpg (spanish). The jpg isn’t a scan but the digital copy of the newspaper page so I expect it to be pretty accurate. Unfortunately the result of the OCR process is:

4DESTAQUE Naercoreazemeservembremerzoadamsrrem

-ili/][][][]I][][][][][][][]I]I][l][][][I[][][///[][][][][][][][][][///l/][][<’/][][][][I][][][][][][][I/h/][][][][]]]]]]]]]]]]]]]][]I1I]]]]]][][]I][][][]]]]]]][][][][][][][][][][][][][][/////////][][][][][][]I][][][][]]]]][LI][][]I][].illili]]][I][][][][][][][][][][][][][][][][][][][.ili]]]]]]]]]]]]]I][][.[][][][[

The code is pretty simple however I’m not seeing any results. We have licenses for aspose pdf and cells which work great but am having trouble getting this test to work. The byte array is fine, I’ve written it to disk and the spa.zip (downloaded from the aspose as the spanish dictionary) is in the appropriate path. Any help?

oe.LanguageContainer.Clear();

oe.LanguageContainer.AddLanguage(LanguageFactory.Load(Path.Combine(path, “spa.zip”)));

oe.Config.DetectTextRegions = true;

foreach (var f in files)

{

<span style=“font-family: “Courier New”;”> oe.Image = ImageStream.FromStream(new MemoryStream(f.Contents), ImageStreamFormat.Jpg);

if (oe.Process())

{

f.Ocr = oe.Text.ToString();

db.Update(f, new List() { “Ocr” });

}

ikram.haq · October 3, 2016, 12:39pm

Hi Sam,

Thank you for your inquiry.

We need to regenerate the issue at our end for further analysis. It is therefore requested to please share the image that contains Spanish language. We will investigate the issue and will update you about our findings via this forum thread.

You may please visit the following links for details on performing OCR operation with different languages:

Working with Different Languages

sbosell · October 3, 2016, 1:31pm

What is the best way to privately share it with you?

ikram.haq · October 4, 2016, 2:21pm

Hi Sam,

Thank you for sharing sample.

This is to update you that Aspose.OCR currently supports Verdana, Times New Roman, Courier New, Tahoma, Calibri & Arial fonts in Normal, Italic & bold styles of fonts and does not support images with colorful background. The sample you have shared seem to have different fonts and colorful back ground.

Furthermore we have found that the image provided by you has DPI value i.e. 280. Please note that the current implementation of the Aspose.OCR APIs perform well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution.

The issues have already been logged into our system. The details of the issues are given below:

OCRNET-1340: OCR on low DPI images.

OCRNET-1053: OCR on colorful background.

In addition to above we have encountered an exception. This issue has also been logged into our system with ID OCRNET-2874. Our product team will further look into it. We will update you with the progress via this forum thread.

awais.hafeez · March 29, 2018, 5:23am

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.