Poor recognition


Hi,
I am getting rather poor recognition rates from document images (sample attached).
In a paragraph of 700 characters, 87 char operations should be applied to match
the reference text (Levenshtein distance method), which amounts to 12%+ err rate.

So far, I tried applying preprocessing filters, but that didn't improve the results at all.

Can you suggest the approach to improve the recognition rate?

Regards,


Details:

Aspose.OCR 2.2.0; evaluation license
snippet:
_engine.Image = ImageStream.FromFile( imageFile );
_engine.Config.DoSpellingCorrection = true;
Assert.IsTrue( _engine.Process() );
var extractedText = _engine.Text.ToString();

sample output:
==============

lnterpretation:Monoclenal B-sell populateen with an immunophenotype not aliowing fMRAer ciassitseatron
ageected by flouv cytomete {s@W Ccrmments}.
Comments:
The fIow cNometry findings ave consistent with Iymph node involvement by B-ceII Iymphomalleukemia. The mmunophenotype does not
suppon a diagnosis of cIassic chronc Iymphocytc leukemialsmall IymphocWic IVmphoma (CLUSLL), mantle cell Iymphoma (MCL).
follicular lYmphoma (FL) ov hairY ceII Ieukemia (HCL). However, an unusual variant of B-ceII Iymphomalleukemia. such as CD1 0-negative
FL and CD5-negative MCL can not be excluded. Please correlate the result with morphologic frnangs, clinical informatron and olher
d iag nostic ti nd i ngs.
===============

Hi Matt,


Thank you for contacting Aspose support.

We have evaluated your presented scenario while using the latest version of Aspose.OCR for .NET 2.2.0 and we are able to get incorrect results from your provided image sample. The most probable reason for low accuracy rate is the poor quality of the image it self. Please note, the Aspose.OCR APIs works well with high resolution images having at least 300 DPI. The provided image is just 96 DPI. We are working to tweak the OcrEngine to retrieve better accuracy from the same image however, it would be great if you can provide the same graphics in high resolution for our testing.

Hi again,


This is to inform you that I was unable to get the desired results from your provided sample image, even after saving the image with better resolution (300 DPI) therefore I have logged the problem in our bug tracking system for thorough investigation. The ticket number for future reference is OCR-33914. Please spare us little time to analyze the problem cause on our end. In the meanwhile, we will keep you posted with updates in this regard.

Hi,


This is to inform you that we have reduced the recognition error rate to 9.7% with the upcoming release of Aspose.OCR for .NET 2.3.0, and we will further improve the accuracy rate with future releases. We hope that the results (as provided below) are acceptable for now.

lnterpretation:uonoclenal B-sell populateen with an immunophenotype not allowing funner classitieation
ageected by flouv cytomete isom Ccrmments}.
Comments:
The tiow chometry tindings ale consistent with Iymph node involvement by B-cell Iymphomalleukemia. The immunophenotype does not
suppon a diagnosis of classic chronic Iymphocytic leukemialsmall IymphocWic IVmphoma (CLUSLL), mantle cell Iymphoma (MCL).
follicular lymphoma (FL) ov hairy cell Ieukemia (HCL). However, an unusual variant of B-cell Iymphomalleukemia. such as CD1 0-negative
FL and CD5-negative MCL can not he excluded. Please correlate the result with morphologic findings, clinical information and olher
d iag nostic ti nd i ngs.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.