Aspose.OCR for .NET fails to extract text from image

C# code

Aspose.OCR.License license = new Aspose.OCR.License();
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.ClearNotifies();
ocrEngine.Config.ClearRecognitionBlocks();
ocrEngine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(0, 0, 800, 30));
ocrEngine.Config.DetectTextRegions = false;
ocrEngine.Image = ImageStream.FromFile(@“D:\btest.png”);
if (ocrEngine.Process())
{
foreach (IRecognizedPartInfo info in ocrEngine.Text.PartsInfo)
{
IRecognizedTextPartInfo textInfo = (IRecognizedTextPartInfo)info;
Console.WriteLine(“Block: {0} Text: {1}”, info.Box, textInfo.Text);
}
}

Output in attachemnt called output

Hi Dragos,


We have evaluated the above said scenario on our end using the code snippet provided by you. We have used the latest version of Aspose.OCR for .Net 2.7.0. While testing it was found that the image provided by you has very low DPI value i.e. 96. Therefore, it is strongly recommended that you should perform OCR on at least 300 DPI of image.

Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.

Hello Ikram,

Thank you for your reply.
The DPI thing makes sense to me.
However, even if I feed the Aspose OCR library with the same image (but 300x300 DPI or even 600x600) the output is the same.
Can you confirm that this work on your end ?
If there any other trick i am missing ?
Cheers,
Dragos

Hi Dragos,


Thank you for your understanding.

Please share a sample in high resolution so we could perform a few tests on our side and get back to you with updates.

see attached the image with high resolution and the output and the C# code

the library manages to extract some text, but it pretty much scrambled
Thank you for looking into this,
Dragos

Hi Dragos,


You are right. I have tested the newly shared sample and have received the garbage data as output. I will log it for further investigation after performing a few more test. In the meanwhile, I can suggest you to recognize the text from part of the image. This approach is useful in scenarios where documents following the same structure have to be process. Be advised, the presented approach will not be useful on random images.

Hello team,

Any updates on the above issue ? With the current status of evaluation I’m afraid the OCR module fails to do what it says it can do …

Regards,
Dragos

Hi Dragos,


I am afraid, we haven’t yet received any updates regarding the ticket logged earlier as OCR-34109. As discussed earlier in this thread, if you have some real time document scans and you wish to get the text from a specific part of the image (not from complete image), you can use the custom recognition blocks. It should produce desired results against high resolution scans.

We are sorry for the inconvenience.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.