PDF OCR Recognition Quality is very bad

EvgeniyMikhailov · April 2, 2024, 5:56am

I’m trying to implement conversion from simple DPF to Searchable PDF. I use code from aspose samples:

                Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();

                Aspose.OCR.DocumentRecognitionSettings recognitionSettings = new Aspose.OCR.DocumentRecognitionSettings();
                recognitionSettings.Language = Aspose.OCR.Language.Eng;
                recognitionSettings.DetectAreasMode = Aspose.OCR.DetectAreasMode.TABLE;



                var results = recognitionEngine.RecognizePdf(temporaryFileIn, recognitionSettings);

                Aspose.OCR.AsposeOcr.SaveMultipageDocument(temporaryFileOut, Aspose.OCR.SaveFormat.Pdf, results);

I tried different options in DocumentRecognitionSettings, but none helped - recognition and resulting file quality if very bad - if comparing to Acrobat Pro searchable PDF results, part of text not recognized, but a lot of grabage detected.

could you, please, help me? What I’m doing wrong?

And is it posible to keep pages with text (pages 2,3 in source file) as is in resulting file?

I’m using licensed Aspose.PDF 22.10.0 on Windows
File I use to test is attached. Bad detection quality is on pages 1 and 4
testOCR.pdf (371.0 KB)

Best version of recognized text I was able to get:

DRIVERS LICENSE
i NOT FOR REAL ID PURPOSES
s 3}, 99 999 999 Sups: [C HdDLN:
& (1/07|1973 DOA
2 ANDREW JASON
$P .
APT. 1 g HARRISBURG, PA 17101-0000 < p * *!. 4bEXF: 01/08/2026 * [ * 01/07/2022 **3Ax }*tt :; [I & $ Lex * *B * 4aISS: “i x L:Jxx [l Q 15SEX: M 18 EYES: BRO ktt t
E $ 16HGT: 5-11”
K $ 9CLASS: C
[ 9a END: NONE P
[ K 2 RESTR: NONE CDL s4n duur ;3 ; Snpole I
5 DD:1234$678901Z3 o RGAN DONOR

One more example - very high contrast image with simple text, but results are awful
image.png (38.1 KB)

asad.ali · April 2, 2024, 8:42pm

@EvgeniyMikhailov

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRNET-821

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.