Adobe Acrobat highlights incorrect text when searching OCR converted PDF

jeff_grant · October 10, 2023, 8:52pm

I’m using Aspose.OCR 23.9.0.0 in a .NET Framework 4.8 project. After I do the OCR conversion of the PDF containing an image, the search (Control + F) in Adobe Acrobat finds the text for which I’m searching but it does not highlight the correct text. It’s off by a few words in some cases. Also, if I copy and paste text from the PDF, the text is not what I selected in the PDF. Please see the attached files for an example. What could cause this behavior?

Code snippet:

var asposeOcr = new Aspose.OCR.AsposeOcr();

OcrInput input = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.PDF);
input.Add(serverFilePath);

var settings = new Aspose.OCR.RecognitionSettings();
settings.Language = Aspose.OCR.Language.Eng;
settings.DetectAreasMode = DetectAreasMode.DOCUMENT;

List<Aspose.OCR.RecognitionResult> results = asposeOcr.Recognize(input, settings);

AsposeOcr.SaveMultipageDocument(serverFilePath, Aspose.OCR.SaveFormat.Pdf, results);

HighlightIssue.jpg (61.3 KB)
OCR.pdf (588.6 KB)
source.pdf (26.0 KB)

asad.ali · October 11, 2023, 2:04am

@jeff_grant

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRNET-746

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.