Improve accuracy of a scanned document

Please suggest a way to improve the accuracy of extracted text from the attached document

The pre-processing filters did not help.a.jpg (696.9 KB)Scan1.jpg (173.6 KB)

@sudesh

We could not find any attached file with your post. Would you please share it so that we can test the scenario in our environment and address it accordingly.

Please find the attachment now. I did not have permissions to attach files previously and I did not realize that. Thank You

@sudesh

We were able to notice that API returned garbage values while performing OCR over the images you shared. We have logged an issue as OCR-774 in our issue tracking system. We will further look into details of it and keep you posted with the status of its correction. Please spare us little time.

We are sorry for the inconvenience.

@sudesh

With Aspose.Ocr 23.7.0, we have got the better results
The code we have used:

OcrInput input = new OcrInput(InputType.SingleImage);
input.Add(@"a.jpg");
var result = api.Recognize(input, new RecognitionSettings
{
 DetectAreasMode = DetectAreasMode.PHOTO
});
Console.WriteLine(result[0].RecognitionText);
AsposeOcr.SaveMultipageDocument("D://a.txt", SaveFormat.Text, result); 

results.zip (1.1 KB)

@asad.ali Hello, may I ask how to accurately search for image information using Java? If you need more information, please let me know. thanks
Input File 2.jpg (142.7 KB)

This is the code I am using
AsposeOCR api = new AsposeOCR();

    RecognitionSettings recognitionSettings = new RecognitionSettings();
    recognitionSettings.setLinesFiltration(true);

    recognitionSettings.setAllowedCharacters(CharactersAllowedType.ALL);
    recognitionSettings.setLanguage(Language.Chi);
    RecognitionResult result = api.RecognizePage("D:\\java\\work\\test\\src\\main\\resources\\2.png", recognitionSettings);
    System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");

@Mikeykiss

Can you please explain a bit more about your question? Are you unable to extract all text from this image? What type of image information do you need to get using the API?

Unable to obtain all information in the image

@Mikeykiss

We are checking it and will get back to you shortly.

@sudesh

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRJAVA-333

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@Mikeykiss

Unfortunately, we can’t recognize numbers on this image. Only chinese characters.

RecognitionSettings set = new RecognitionSettings();
set.setLanguage(Language.Chi);
set.setDetectAreasMode(DetectAreasMode.PHOTO);

It’s our text detector. It catchs numbers, but

  1. they rotated
  2. they crossed out

Even if we rotate image - we can’t recognize them.


18T
(
副
型
警
/
。
l
322
.32
厂
"
9(
Q

90
:一一
忍
/


一
一
1
)

旦
i
出
含
"T
目
影
士

乡
密
樊
渺
)
H

l
单件
由由

岩
生
世P
a
排
—。
Y
目鹏
;

亡
夕
鸟
憾耸
肯
"
■
门S
叶
但又台
3
乡x

早
沙
32

]乡
32

80
山微乡
8棉



/
做
仍

4
C
1
2km
y

地名
国道
公路
河
流
6
%入
心)
~公
口
一
平
爸跳
完钻井
完钻井
设计井
设计井
期)
(上部)
(加密)
(加密)
(上部)

187[50

We will plan to add ability to recognize mixed vertical and horizontal text on the image. But this particular image seems to be very hard to recognition, and we think we will not get good result in any case.