Aspose.OCR for .NET fails to extract text from image

dhaidet · July 2, 2015, 8:23am

C# code

Aspose.OCR.License license = new Aspose.OCR.License();

OcrEngine ocrEngine = new OcrEngine();

ocrEngine.ClearNotifies();

ocrEngine.Config.ClearRecognitionBlocks();

ocrEngine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(0, 0, 800, 30));

ocrEngine.Config.DetectTextRegions = false;

ocrEngine.Image = ImageStream.FromFile(@“D:\btest.png”);

if (ocrEngine.Process())

{

foreach (IRecognizedPartInfo info in ocrEngine.Text.PartsInfo)

{

IRecognizedTextPartInfo textInfo = (IRecognizedTextPartInfo)info;

Console.WriteLine(“Block: {0} Text: {1}”, info.Box, textInfo.Text);

}

Output in attachemnt called output

ikram.haq · July 2, 2015, 5:49pm

Hi Dragos,

We have evaluated the above said scenario on our end using the code snippet provided by you. We have used the latest version of Aspose.OCR for .Net 2.7.0. While testing it was found that the image provided by you has very low DPI value i.e. 96. Therefore, it is strongly recommended that you should perform OCR on at least 300 DPI of image.

Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.

dhaidet · July 3, 2015, 2:58am

Hello Ikram,

Thank you for your reply.

The DPI thing makes sense to me.

However, even if I feed the Aspose OCR library with the same image (but 300x300 DPI or even 600x600) the output is the same.

Can you confirm that this work on your end ?

If there any other trick i am missing ?

Cheers,

Dragos

babar.raza · July 3, 2015, 3:20am

Hi Dragos,

Thank you for your understanding.

Please share a sample in high resolution so we could perform a few tests on our side and get back to you with updates.

dhaidet · July 3, 2015, 6:25am

see attached the image with high resolution and the output and the C# code

the library manages to extract some text, but it pretty much scrambled

Thank you for looking into this,

Dragos

babar.raza · July 3, 2015, 9:39am

Hi Dragos,

You are right. I have tested the newly shared sample and have received the garbage data as output. I will log it for further investigation after performing a few more test. In the meanwhile, I can suggest you to recognize the text from part of the image. This approach is useful in scenarios where documents following the same structure have to be process. Be advised, the presented approach will not be useful on random images.

DragosHaidet · July 23, 2015, 2:50am

Hello team,

Any updates on the above issue ? With the current status of evaluation I’m afraid the OCR module fails to do what it says it can do …

Regards,

Dragos

babar.raza · July 23, 2015, 3:08am

Hi Dragos,

I am afraid, we haven’t yet received any updates regarding the ticket logged earlier as OCR-34109. As discussed earlier in this thread, if you have some real time document scans and you wish to get the text from a specific part of the image (not from complete image), you can use the custom recognition blocks. It should produce desired results against high resolution scans.

We are sorry for the inconvenience.

awais.hafeez · March 29, 2018, 5:23am

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.