I’m evaluating the OCR toolkit and getting pretty bad results. The attached TIF file contains very clean text scanned at 300 dpi but I’m getting completely inaccurate results. Is there something I should be doing differently to improve it?
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.Image = ImageStream.FromFile(“c:\temp\ocr\Test Scan.tif”);
ocrEngine.ProcessAllPages = true;
if (ocrEngine.Process())
{
string outputFile = “c:\temp\ocr\Test Scan.txt”;
using (StreamWriter sw = new StreamWriter(outputFile))
{
for (int iCount = 0; iCount <= (ocrEngine.Pages.Count() - 1); iCount++)
{
sw.Write(ocrEngine.Pages[iCount].PageText);
}
}
}
Hi Brent,
Thank you for considering Aspose.OCR APIs.
We have tried to reproduce the said issue on our end. We have used the latest version Aspose.OCR for .Net 2.9.0. We are able to reproduce the issue. The issue has been logged into our issue tracking system with ID OCR-34158. Our product team will look into it and provide feedback. We will update you accordingly. For your reference we are attaching the sample code and the output that is pretty much the same as forwarded by you.
We are sorry for the inconvenience caused.
CODE:
var sb = new StringBuilder();
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.Image = ImageStream.FromFile(@"C:\Ctrash\Input\Test_Scan.tif");
if (ocrEngine.Process())
{
sb.Append(ocrEngine.Text);
sb.Append(Environment.NewLine);
}
Console.WriteLine(sb);
OUTPUT:
XOtiCalCharpitermg]itiOn
FrOm mikipedia, the free encyclopedia
OptiCal CharaCter reCOgnitiOn (OCR) iS the meChanicar or ereccronic conversion or images or
typedi handWritten Or printed teXt intO maChine-encoueu rexr. rr is wiaelY useu as a rorm or uaca
entm frOm printed PaPer data reCOrdS, Whether PaSSPOW dOCUmentS, inVOiCeS, bank statementS,
COmpUteriZed reCeiptS. bUSineSS carus, mair, prinrours or sraric-uara, or anY suiraure
docUmentatiOn- lt iS a COmmOn methOd Of digitiZing printed teXtS SO that it Can be eleCtrOniCally
editedy SearChedy StOred mOre COmPaCtlY1 diSplayed On-line, and USed in maChine prOCeSSeS
SUCh aS maChine tranSIatiOn, teXc-ro-speecn, xeY uaca ana rexc mining. ocR is a rieru or
reSearCh in pattern recognition, anificial inrexiigence anu comPurer vision.
EarlY VerSiOnS needed tO be trained With images of each character, and worked on one font gt g
time- AdVanCed SYStemS CaPable Of producing a high degree of recognition accuracy for mosc
fOntS are nOW COmmOn. SOme SyStems are canaure or reProuucing rormazced ourpur cnac croserY
apprOXimateS the Original page inClUding imageS, COIMmns, anu ocner non-cexrual componencs.
-
The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.