Can not OCR image

taranyuk · June 18, 2019, 5:21pm

Can not OCR image (300dpi).

my java code:

public void extractTextFromImage(String imagePath) {
OcrEngine ocr = new OcrEngine();
ocr.setImage(ImageStream.fromFile(imagePath));
try {
long start = System.currentTimeMillis();
if (ocr.process()) {
System.out.println(">>>"+ocr.getText());
}
System.out.println("finish in "+(System.currentTimeMillis()-start));
} catch (Exception e) {
e.printStackTrace();
}
}

console output:
>>>e
finish in 253603

Why OCR output does not make sense and why did it take so much time?

asad.ali · June 18, 2019, 8:12pm

@taranyuk

We have tested the scenario in our environment using Aspose.OCR for Java 17.11 and were unable to notice the issue that you have mentioned. API was able to recognize text from image. For your reference, an image for console output is also attached. Would you please share a sample console application which is able to reproduce the issue. We will test the scenario again in our environment and address it accordingly.
OCR_output.png (18.1 KB)

taranyuk · June 19, 2019, 9:36am

I have tested again on different environments.

On my dev laptop with Windows 8, JDK 1.8.0_25, i got incorrect result as described above .
on test env with Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-91-generic x86_64), openjdk 1.8.0_111 i got correct results.

I used same java program.

Why? Are there any env requirements or conflicts with other soft? F.e. i have installed Tesseract-OCR on my dev laptop.

asad.ali · June 19, 2019, 11:54pm

@taranyuk

There is no such limitation or conflict with other software in the API. However, would you please share download link of Tesseract-OCR that you have installed in your system. We will further test the scenario in our environment and share our feedback with you.

taranyuk · June 20, 2019, 6:57am

there is my Tesseract version:
C:\Program Files (x86)\Tesseract-OCR>tesseract -v
tesseract 3.02
leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5

URL with install instructions Home · tesseract-ocr/tesseract Wiki · GitHub

asad.ali · June 20, 2019, 10:18pm

@taranyuk

We have tested the scenario again by installing tesseract 3.02 from given link and were unable to notice any issue. As requested earlier, could you please share a sample console application which is able to reproduce the error in Windows environment. We will again test the scenario in our environment and address it accordingly.

taranyuk · June 21, 2019, 12:56pm

Hi. Have created a test program for you https://drive.google.com/open?id=1JBXtxCnorsM0nCSblv_cHz4Q2YPGVN8e
Fix paths to image and licence in bat file before start.
I started it locally with these results:
1.png (3.5 KB)

Farhan.Raza · June 21, 2019, 9:17pm

@taranyuk

We appreciate your cooperation and would like to request you to share SSCCE application containing the code instead of a BAT file so that we may proceed to address your concerns efficiently.

taranyuk · June 22, 2019, 11:06am

import com.aspose.ocr.ImageStream;
import com.aspose.ocr.OcrEngine;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

public class Main {

public static void main(String[] args) {
String licencePath = args[0];
String imagePath = args[1];
System.out.println(“licencePath=”+licencePath);
System.out.println(“imagePath=”+imagePath);
InputStream stream1;
try {
stream1 = new FileInputStream(new File(licencePath));
new com.aspose.ocr.License().setLicense(stream1);
} catch (Exception ex) {
ex.printStackTrace();
}
extractTextFromImage(imagePath);
}

public static String extractTextFromImage(String imagePath) {
OcrEngine ocr = new OcrEngine();
ocr.setImage(ImageStream.fromFile(imagePath));
try {
if (ocr.process()) {
String text = ocr.getText().toString();
System.out.println(“Extracted text:\n”+text);
return text;
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}

asad.ali · June 23, 2019, 2:03am

@taranyuk

Thanks for sharing requested details.

We have logged an investigation ticket as OCR-701 in our issue tracking system. We will further look into details of the ticket and keep you posted with the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.