We try to get the ocr running for our application. We would like to get german text out of printscreens. We have seen your help site with the different characters: https://docs.aspose.com/ocr/net/recognition-languages/ and found there also german characters like ö ä ü, …
From the attached “sample.jpg” the ocr text is:
ÖäfüAE-Cb (according to the sample.jpg it should be: öäüAEO
Faraway from the sample.jpg, even it has a good resolution and it is arial with no background.
//OCR
string dataDir = @"c:\temp\";
AsposeOcr api = new AsposeOcr();
string result = api.RecognizeImage(dataDir + "sample.jpg");
Console.WriteLine(result);
File.WriteAllText(dataDir + "sample.txt", result);
We are using aspose.ocr 20.4.2.0.
What’s wrong here?
Thanks a lot for your support.
Incite GmbH
Marc Huber
We have been able to reproduce the issue in our environment. Therefore, have logged it as OCRNET-177 in our issue tracking system. We will further look into its details and keep you informed about its rectification status. Please be patient and spare us some time.
We are continuously working over improving the recognition quality in the API. The latest version Aspose.OCR for .NET 20.7 has better recognition performance. Would you kindly try it and let us know about your feedback. Furthermore, we will inform you as soon as the ticket is closed.
We have updated the ticket information as per your provided feedback and will inform you as soon as it is resolved. We greatly appreciate your patience in this matter.
I tried similar on my side and in JPG with öäüAEO I got ,y9%.
Font was Calibri and test was with registered version of Aspose.OCR 21.1 OCR component still has issues.
One of the problems that DSR Model of the API better works with multi-lines pages. So we have overridden method where we can switch off DSR Model use (detect areas = false). This API-method gives a better result for this particular image.
Code example
AsposeOcr api = new AsposeOcr();
var img = @".\sample.jpg";
var res = api.RecognizeImage(img, false);
RESULT:
ÖäÜAEO
Still, we have an anomaly in lower case letters recognized as upper case. We hope we will solve this in next releases. You will surely be notified as soon as the issue is resolved. Please give us some time.
With Aspose.Ocr 21.1.2 we have got the next result:
öäUAEO
We have used the next code:
AsposeOcr api = new AsposeOcr();
var res = api.RecognizeImage(imgPath, new RecognitionSettings { RecognizeSingleLine = true });
Console.WriteLine(res.RecognitionText);
res.Save("D://res.txt", SaveFormat.Text);
In the current release, we improved our model and got better recognition quality. Also, notice that for a single line better use flag:
RecognizeSingleLine = true
With 21.3v (current) version we have got öäüAEO. Code
AsposeOcr api = new AsposeOcr();
var res = api.RecognizeImage(imgPath, new RecognitionSettings { RecognizeSingleLine = true });
Console.WriteLine(res.RecognitionText);
res.Save("D://res.txt");