Can not OCR image


#1

Can not OCR image (300dpi).

my java code:

public void extractTextFromImage(String imagePath) {
OcrEngine ocr = new OcrEngine();
ocr.setImage(ImageStream.fromFile(imagePath));
try {
long start = System.currentTimeMillis();
if (ocr.process()) {
System.out.println(">>>"+ocr.getText());
}
System.out.println("finish in "+(System.currentTimeMillis()-start));
} catch (Exception e) {
e.printStackTrace();
}
}

console output:
>>>e
finish in 253603

Why OCR output does not make sense and why did it take so much time?


#2

@taranyuk

We have tested the scenario in our environment using Aspose.OCR for Java 17.11 and were unable to notice the issue that you have mentioned. API was able to recognize text from image. For your reference, an image for console output is also attached. Would you please share a sample console application which is able to reproduce the issue. We will test the scenario again in our environment and address it accordingly.
OCR_output.png (18.1 KB)


#3

I have tested again on different environments.

  1. On my dev laptop with Windows 8, JDK 1.8.0_25, i got incorrect result as described above .
  2. on test env with Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-91-generic x86_64), openjdk 1.8.0_111 i got correct results.

I used same java program.

Why? Are there any env requirements or conflicts with other soft? F.e. i have installed Tesseract-OCR on my dev laptop.


#4

@taranyuk

There is no such limitation or conflict with other software in the API. However, would you please share download link of Tesseract-OCR that you have installed in your system. We will further test the scenario in our environment and share our feedback with you.


#5

there is my Tesseract version:
C:\Program Files (x86)\Tesseract-OCR>tesseract -v
tesseract 3.02
leptonica-1.68 (Mar 14 2011, 10:43:03) [MSC v.1500 LIB Release 32 bit]
libgif 4.1.6 : libjpeg 8c : libpng 1.4.3 : libtiff 3.9.4 : zlib 1.2.5

URL with install instructions https://github.com/tesseract-ocr/tesseract/wiki#windows


#6

@taranyuk

We have tested the scenario again by installing tesseract 3.02 from given link and were unable to notice any issue. As requested earlier, could you please share a sample console application which is able to reproduce the error in Windows environment. We will again test the scenario in our environment and address it accordingly.


#7

Hi. Have created a test program for you https://drive.google.com/open?id=1JBXtxCnorsM0nCSblv_cHz4Q2YPGVN8e
Fix paths to image and licence in bat file before start.
I started it locally with these results:
1.png (3.5 KB)


#8

@taranyuk

We appreciate your cooperation and would like to request you to share SSCCE application containing the code instead of a BAT file so that we may proceed to address your concerns efficiently.


#9

import com.aspose.ocr.ImageStream;
import com.aspose.ocr.OcrEngine;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

public class Main {

public static void main(String[] args) {
String licencePath = args[0];
String imagePath = args[1];
System.out.println(“licencePath=”+licencePath);
System.out.println(“imagePath=”+imagePath);
InputStream stream1;
try {
stream1 = new FileInputStream(new File(licencePath));
new com.aspose.ocr.License().setLicense(stream1);
} catch (Exception ex) {
ex.printStackTrace();
}
extractTextFromImage(imagePath);
}

public static String extractTextFromImage(String imagePath) {
OcrEngine ocr = new OcrEngine();
ocr.setImage(ImageStream.fromFile(imagePath));
try {
if (ocr.process()) {
String text = ocr.getText().toString();
System.out.println(“Extracted text:\n”+text);
return text;
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}


#10

@taranyuk

Thanks for sharing requested details.

We have logged an investigation ticket as OCR-701 in our issue tracking system. We will further look into details of the ticket and keep you posted with the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.