OCR Not Extract Text from Image in Java

bhanwar.rathore · December 7, 2015, 7:56am

I am trying to extract Text from Image by Using OCR its not getting me any text instead of that its getting me %m when i write OCREngine.getText() thats not correct . Pls look at my code and tell me the proper solution .

My code is :

public static void main(String[] args) throws Exception

{

// The path to the documents directory.

// String dataDir = Utils.getDataDir(PerformOCROnImage.class);

String dataDir = “F:/OCR/test1.bmp”;

/// Set the paths

// String imagePath = dataDir + “Sampleocr.bmp”;

// Create an instance of OcrEngine

OcrEngine ocr = new OcrEngine();

// Set image file

ocr.setImage(ImageStream.fromFile(dataDir));

// Perform OCR and get extracted text

try {

if (ocr.process()) {

// System.out.println(“getText " + ocr.getText().toString()+”\n");

// System.out.println(“getPages " + ocr.getPages().length+”\n");

// System.out.println("getPreprocessedImages "+ocr.getPreprocessedImages().getTextBlocksImage());

// System.out.println(“getPages.toString “+ocr.getText() +”\n”);

IRecognizedTextPartInfo firstBlock = (IRecognizedTextPartInfo) ocr.getText().getPartsInfo()[0];

System.out.println(firstBlock.getBox().toString());

//Get the children of the first block that will the the lines in the block

IRecognizedPartInfo[] linesOfFirstBlock = firstBlock.getChildren();

//Retrieve the fist line from the collection of lines

IRecognizedTextPartInfo firstLine = (IRecognizedTextPartInfo)linesOfFirstBlock[0];

//Display the level of line

System.out.println(firstLine.getText());

//Retrieve the fist word from the collection of words

IRecognizedTextPartInfo firstWord = (IRecognizedTextPartInfo) firstLine.getChildren()[0];

//Display the level of word

System.out.println(firstWord.getText());

//Retrieve the fist character from the collection of characters

IRecognizedTextPartInfo firstCharacter = (IRecognizedTextPartInfo)firstWord.getChildren()[0];

//Display the level of character

System.out.println(firstCharacter.getText());

}

} catch (Exception e) {

e.printStackTrace();

}

ikram.haq · December 7, 2015, 11:58am

Hi Bhanwar,

Thank you for you inquiry and sharing the code snippet.

Please note that we need the sample scanned image from which you are trying to extract the information. This will help us to investigate the issue.

bhanwar.rathore · December 8, 2015, 12:54am

These are the images from which i am trying to extract text or contents .

ikram.haq · December 8, 2015, 3:39am

Hi Bhanwar,

Thank you for sharing sample images.

We have evaluated the attached image at our end. While testing it was found that the image has very low DPI value i.e. 96. Please note that the current implementation of the Aspose.OCR API works well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution. Your provided image has resolution of 96 DPI therefore it will not be possible to get 100% accuracy if you wish to scan the complete image. On the other hand, if you intend to get some specific contents from a portion of the image, you can use the custom recognition blocks to get better accuracy.

Please note, the above mentioned solution is useful in scenario when you have documents following the similar structure, that is; the contents to be scanned are always on the same location for each image.

Hope the above information helps. Feel free to contact us in case you have further query or comments.

bhanwar.rathore · December 8, 2015, 6:27am

I tried with High Quality Image also but its not giving me any output my image is

ikram.haq · December 8, 2015, 1:23pm

Hi Bhanwar,

Thank you for writing us back.

Please note that we have evaluated the attached image at our end. While testing it was found that this image too has very low DPI value i.e. 72.

bhanwar.rathore · December 8, 2015, 11:17pm

Can you please give me any image that works according to you i want to test my code with image given by you .

ikram.haq · December 9, 2015, 12:10pm

Hi Bhanwar,

Please find the sample image attached. You can perform OCR operation on the attached image. Furthermore we have logged the OCR operation on low DPI images issue into our issue tracking system with ID OCR-34250. We are continuously improving recognition quality. Low DPI images will work once issue OCR-34250 is fixed.

bhanwar.rathore · December 9, 2015, 11:12pm

I tried My Code with your Given image Still its not giving me full content of image i am able to get only :

" the raven,never hining,still is sining,still is<Rest of the text is trimmed due to evaluation restriction!> "

if image quality is good but i should get full content whatever is text inside that image

Thanks for your assistance .

ikram.haq · December 10, 2015, 3:13am

Hi Bhanwar,

Thank you for writing us back.

Please note that you are getting evaluation restriction message because you have not set the license before calling any functionality of Aspose.OCR API. Following is the code snippet for your reference to set the license.

com.aspose.ocr.License _license = new com.aspose.ocr.License();
_license.setLicense("C:\\xxx.lic");

awais.hafeez · March 29, 2018, 5:23am

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.