Read scanned PDF documents using Aspose.PDF

theGridlock · November 27, 2020, 6:01am

Hi,
after scanning the document to pdf, i want to read the entire text in it, i have tried some libraries but with no success!
looks like I will need to combine ocr aspose and pdf aspose, but i’m not sure this maybe because i have big problem in font reading support language when exporting
You can see the attached file below.
file_dropbox
basically, I need to extract multiLine text in paragraph I and II
Other text is not needed.
please check this!

asad.ali · November 29, 2020, 3:09pm

@theGridlock

Would you please confirm if you want to only extract text from PDF or do you want to convert scanned PDF into searchable PDF document? Furthermore, please also share in which platform you want to use the API i.e. .NET/Java? We will further check related information at our side and share our feedback with you accordingly.

theGridlock · December 1, 2020, 5:07am

Hi asad.ali
I want to extract text, for example something
string txt= page.gettext (all,…).
I am using the .Net aspose
The extracted language is vietnamese.
Please check!

asad.ali · December 1, 2020, 6:12pm

@theGridlock

You need to convert PDF Pages to Images using Aspose.PDF and then perform an OCR operation over the obtained image using Aspose.OCR. Furthermore, you can check the supported language characters in the API documentation which our OCR API can recognize at the moment.

It seems like the file has been deleted from the link which you provided earlier. Would you kindly re-upload it and share the link with us again. We will log an investigation ticket accordingly and share the ID with you.