OCR DLL not reading tiff image properly

Hi Team,


i’m using 17.3.0.0 of Aspose.OCR for .NET.
OCR do not recognize all the words in the image it also gives lots of symbols in places of words.
Attached the tiff source and generated text file.
How can i get good quality text data from the image. Actually i would be searching for the words after retrieving the text data.

Following is the code snippet i use:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas; color: #33a2bd} p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas; min-height: 13.0px} p.p4 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas; color: #008f00} span.s1 {color: #33a2bd} span.s2 {color: #0433ff} span.s3 {color: #000000} span.s4 {color: #b4261a}

Aspose.OCR.OcrEngine ocrEngine = new Aspose.OCR.OcrEngine();



ocrEngine.Image = ImageStream.FromFile(@"C:\GAImages\1.Tiff");

ocrEngine.ClearNotifies();

ocrEngine.ProcessAllPages = true;

ocrEngine.Config.DetectTextRegions = false;

if (ocrEngine.Process())

{

Aspose.OCR.Page[] pages = ocrEngine.Pages;


foreach (Aspose.OCR.Page page in pages)

{

var pageText = page.PageText.ToString();

File.WriteAllText(@"C:\Images\1.txt", pageText);

if(pageText.contains(""))

}

if(pageText.Contains("abc"))

{

Console.WriteLine("It's the required page");

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas} span.s1 {color: #0433ff} span.s2 {color: #b4261a} span.s3 {color: #33a2bd}

}


}



Hi Ed,

Thank you for your inquiry.

We have investigated the issue at our end. While investigation it was found that image has DPI value 200. Note that the current implementation of the Aspose.OCR APIs perform well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution.

Furthermore the sample image contains very dim printing and distorted writing. OCR engine is unable to process such images. Please try a better quality scanned image with dark printing and complete writing. Feel free to contact us in case of any query or comments.

I can only get 200dpi image at the max, so are there any settings which would help the ocr to recognize better.

Noise removal filters - not available in this version. Will this filter help to get better quality text.

Hi Ed,

Thank you for writing us back.

You can apply different correction filters before performing OCR operation on the image. These correction filters will help you to reduce noise and improve the results. If you intend to get some specific contents from a portion of the image, you can use the custom recognition blocks to get better accuracy.

Please note, the above mentioned solution is useful in scenario when you have documents/images following the similar structure, that is; the contents to be scanned are always on the same location for each image. For creating custom recognition block it all depends upon programming convention and image layout. Furthermore the table outlines (boarders of rows & width) will not be recognized properly. It will be mistaken with symbols like “I” for table lines. So this results in low accuracy.

Following are the online documentation links for details and code snippets.

Hope the above information helps. Feel free to contact us in case of any query or comments.

Hi Ikram,


I want to apply RemoveNoiseFilter. But, i don’t find this filter with 17.3.0.0 dll

filter = new Aspose.OCR.Filters.RemoveNoiseFilter();

filters.Add(filter);


I get an error message "the type or namespace name 'RemoveNoiseFilter' does not exist in the namespace 'Aspose.OCR.Filters'".


Can you let me know in which namespace can i find this method.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas} span.s1 {color: #0433ff} p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas; color: #008f00} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.5px Consolas} span.s1 {color: #0433ff}
Hi Ed,

This is to update you that RemoveNoiseFilter has been removed from public API. Algorithm always applies noise removal filer after the image is binarized. So there is no need to set the filter explicitly. We will update the online documentation soon.

Sorry for the inconvenience caused.