OCR fails for a number

Hello,

I’m trying to recognize a number from an image with aspose ocr. The image can be found here: Dropbox - Error - Simplify your life

I’ running the following code. This is my implementation:

public String extractText(InputStream stream) throws DocumentException {
// Create an instance of OcrEngine
OcrEngine ocrEngine = new OcrEngine();
// Set Resources for OcrEngine
ocrEngine.setResource(BmpDocumentService.class.getResourceAsStream(“/Aspose.OCR.1.9.0.Resources.zip”));
// Set NeedRotationCorrection property to false
ocrEngine.getConfig().setNeedRotationCorrection(false);

// Set image file
ocrEngine.setImage(ImageStream.fromBytes(BinaryUtil.toArray(stream), ImageStreamFormat.Bmp));

// Add language
ocrEngine.getLanguages().addLanguage(Language.load(“english”));

CorrectionFilters value = new CorrectionFilters();
value.add(new RemoveNoiseFilter());

ocrEngine.getConfig().setCorrectionFilters(value);
ocrEngine.setDetectTextOnly(false);

// Perform OCR and get extracted text
if (ocrEngine.process()) {
LOG.debug(“getProbabilitySymbols” + ocrEngine.getProbabilitySymbols());
return ocrEngine.getText().toString();
} else {
return null;
}
}

And this is my test that calls the implementation:

public void highQuality() {
String expected = service.extractText(BmpDocumentServiceTest.class.getResourceAsStream(“highQuality.bmp”));
assertThat(expected).isEqualTo(“998.508985.7”);
}
The tests fails with the following output:

org.junit.ComparisonFailure:
Expected :‘998.508985.7’
Actual :‘si J- 8 t ztio- ?’

Are there any parameters to fiddle around to improve OCR-recognition?
Because I only want to parse numbers, it might be possible to define a custom language by implementing the ILanguage-interface, but I don’t know how to do that. An example would be nice.

Greetings,
Rainer

Hi Rainer,


Thank you for contacting Aspose support.

We have evaluated your presented scenario while using the latest version of Aspose.OCR for Java 2.0.0 and your provided sample. Unfortunately, we were unable to get the correct results therefore we have logged the problem in our bug tracking system under the ticket OCR-33821 for further investigation & correction purposes. Please spare us little time to properly analyze the problem cause. In the meanwhile, we will keep you posted with updates in this regard.

The issues you have found earlier (filed as OCR-33821) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.