I’m trying to implement conversion from simple DPF to Searchable PDF. I use code from aspose samples:
Aspose.OCR.AsposeOcr recognitionEngine = new Aspose.OCR.AsposeOcr();
Aspose.OCR.DocumentRecognitionSettings recognitionSettings = new Aspose.OCR.DocumentRecognitionSettings();
recognitionSettings.Language = Aspose.OCR.Language.Eng;
recognitionSettings.DetectAreasMode = Aspose.OCR.DetectAreasMode.TABLE;
var results = recognitionEngine.RecognizePdf(temporaryFileIn, recognitionSettings);
Aspose.OCR.AsposeOcr.SaveMultipageDocument(temporaryFileOut, Aspose.OCR.SaveFormat.Pdf, results);
I tried different options in DocumentRecognitionSettings, but none helped - recognition and resulting file quality if very bad - if comparing to Acrobat Pro searchable PDF results, part of text not recognized, but a lot of grabage detected.
could you, please, help me? What I’m doing wrong?
And is it posible to keep pages with text (pages 2,3 in source file) as is in resulting file?
I’m using licensed Aspose.PDF 22.10.0 on Windows
File I use to test is attached. Bad detection quality is on pages 1 and 4
testOCR.pdf (371.0 KB)
Best version of recognized text I was able to get:
DRIVERS LICENSE
i NOT FOR REAL ID PURPOSES
s 3}, 99 999 999 Sups: [C HdDLN:
& (1/07|1973 DOA
2 ANDREW JASON
$P .
APT. 1 g HARRISBURG, PA 17101-0000 < p * *!. 4bEXF: 01/08/2026 * [ * 01/07/2022 **3Ax }*tt :; [I & $ Lex * *B * 4aISS: “i x L:Jxx [l Q 15SEX: M 18 EYES: BRO ktt t
E $ 16HGT: 5-11”
K $ 9CLASS: C
[ 9a END: NONE P
[ K 2 RESTR: NONE CDL s4n duur ;3 ; Snpole I
5 DD:1234$678901Z3 o RGAN DONOR
One more example - very high contrast image with simple text, but results are awful
image.png (38.1 KB)