Dear Team,
I am evaluating this product Aspose.ocr to extract text from bmp file. But I am not able to extract the text from the input file .
Attaching the same file.
Below is my code :
String dataDir = “src/programmersguide/workingwithocr/performocronimage/data/”;
String imagePath = dataDir + “Sampleocr.bmp”;
String xmlEtalonFileName = “englishStandarts”;
String fontCollectionFileName = “arialAndTimesAndCourierRegular.xml.bin”;
String resourcesFilePath = “…/lib/resources.zip”;
// Create an instance of OcrEngine class but providing required.
// parameters
OcrEngine ocr = new OcrEngine(ResourcesSource.BINARY_ZIP_FILE,
resourcesFilePath, xmlEtalonFileName, fontCollectionFileName);
ocr.getConfig().setNeedRotationCorrection(false);
// Set image file.
File image = new File(imagePath);
ocr.setImage(image);
// Add language…
ILanguage language = Language.load(“english”);
ocr.getLanguages().addLanguage(language);
// Perform OCR and get extracted text.
try {
if (ocr.process()) {
System.out.println("\ranswer -> " + ocr.getText());
BufferedWriter out = new BufferedWriter(new FileWriter(dataDir + “Output.txt”));
out.write(ocr.getText().toString());
out.close();
System.out.println(“OCR performed on Input Image successfully.”);
}
} catch (Exception e) {
e.printStackTrace();
System.out.println("Error: " + e.getMessage());
}
Please help me on this.
Hi Ravindranath,
Thank you for contacting Aspose support.
We have evaluated your presented scenario our end, and we have noticed that the latest version of Aspose.OCR for Java 1.9.0 is unable to extract any meaningful information from your provided sample image. We are not certain what could be the cause of recognition failure therefore we have logged the problem in our bug tracking system under ticket Id OCR-33773. Please spare us little time to properly analyze the problem cause, and to provide a fix at earliest. In the meanwhile, we will keep you posted with updates in this regard.
Hi Ravindranath,
Thank you for your patience.
We have studied your provided sample image for possible cause of recognition failure, and we have found that the main reason for incorrect results are the “Glued Symbols”. Please have a look at the attached image for your reference, you will notice that most of the symbols share the same region (dotted rectangle) forcing more than one symbols to represent a blob. With next official release of Aspose.OCR for Java 2.0.0, we will improve this aspect of the OCR engine for overall better results.
We will keep you posted with updates in this regard.
Hi ,
Is there any workaround solution for this .
Thanks!
Hi Ravindranath,
The problem stated in this thread is on the part of Aspose.OCR APIs, and we are working to fix it with next official release as mentioned in our previous response. Although if you want to give it another try with current version of the API, we would suggest you to get a high resolution image (at least 300x300) of the same document and execute the same code against it.
Please keep us posted with your results on this. Thanks.
not able to extract text from a file having 300 dpi image quality , when can we expect official release of Aspose 2.0.
We have urgent requirement to extract text from scanned pdf files.
Hi Ravindranath,
Sorry to know that increasing the DPI of the image didn’t help in this scenario. We have discuss this particular scenario with the development team lead, and learned that the upcoming release will contain the enhancements to the OCR Engine that will yield better results in terms of performance as well as accuracy. The .NET version of the product is about to be published (in couple of days). As soon as the said release is available for public use we will test your sample for accuracy of recognized text. Moreover, the Java version 2.0 will be available in 3rd quarter of the upcoming month (June 2014).
Dear Team,
Is there any new Aspose java release for OCR text reading.
Hi Ravindranath,
Thank you for your patience with us.
Unfortunately, the release of Aspose.OCR for .NET 2.0.0 is delayed due to which the Java version of the product will be published with some delay. We haven’t yet re-scheduled both releases, but as soon as we get more updates regarding the release schedule, we will post here for you kind reference.
The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.