How to extract text from a scanned document created as PDF file

thiru1711 · December 18, 2019, 12:51pm

Hi,
We have documents scanned as PDF via Xerox scanner. Now,
we need to extract text out of these documents.
Is there any way we could use either OCR or PDF api of Aspose to extract text?

Please find sample Scanned PDF Document for your Reference.
Scan Doc.pdf (326.7 KB)

Adnan.Ahmad · December 19, 2019, 10:35am

@thiru1711,

Thanks for contacting support.

We have checked the file which you have shared and observed that it contained scanned images. I like to inform that you can Extract Images from PDF file using Aspose.Pdf for .NET and Perform OCR on Images using Aspose.OCR for .NET to extract the text. Please share feedback with us if there is still an issue.