How to OCR pdf files?

bpanchu · September 25, 2017, 4:08am

Hi,
We have latest Aspose.Total API license. We have following doubts in OCR API, other than aspose ocr dll we used all other dlls like pdf,cell,image, barcode. Before going to implement we need to clarify about OCR dll.

Is it support multi language including Chinese Traditional, simplified, French etc, and pdf file mixed of English and Chinese woeds.
2.Punch hole remove - when ocr file, it find punch hole mark and remove it.
Is it possible to ocr only non ocr pages.
E.g: if pdf got 10 pages. if pages 3,5,8,10 are already ocr’d and other pages need to ocr,So when ocr using Aspose OCR api, is it skip already ocr’d pages (3,5,8,10)

Regards,
Aravind

ikram.haq · September 25, 2017, 6:11am

@bpanchu,

Thank you for your inquiry. Following are the details:

Aspose.OCR for .NET API currently supports the following languages.
English
Spanish
French
Portuguese
Punch hole remove: There is no such functionality available in the API.
There is no such functionality available in the API to skip pages. It will depend on your logic and implementation. For performing OCR operation on PDF files, you have to process it page by page. While processing you can implement the logic to skip the page or pages.

bpanchu · September 25, 2017, 6:24am

Thanks for reply.

Regards,
Aravind

ikram.haq · September 25, 2017, 6:45am

@bpanchu,

You are welcome. Please feel free to contact us in case any query or comments.