Free Support Forum - aspose.com

PDF/A formate with ocr Searchable

i want to convert pdf in pdf/A formate with ocr searchable formate i.e containing image.can you have any soloution for that. i want to own application

Hi

I want to the dll which convert my pdf to ocr searchable pdf which containing Image. Is it possiblde with Aspose ?


This message was posted using Email2Forum by Imran Rafique. (private)

Hi Praveen,


Thanks for contacting support.

We have an API named Aspose.Pdf for .NET which provides the feature to create as well as manipulate existing PDF files and it also offers the feature to convert PDF files to PDF/A format. But I am afraid it cannot directly convert non-searchable PDF file (image PDF) to searchable PDF document. However in order to accomplish this requirement, you may consider using a combination of Aspose.Pdf for .NET and Aspose.OCR for .NET.

First you can convert all pages of PDF file to Image format using Aspose.Pdf for .NET or you can also extract the images from PDF file using Aspose.Pdf for .NET and then perform OCR on extracted images using Aspose.OCR for .NET.

Once the image contents have been recognized, you may consider placing them inside a new PDF document, which indeed will be searchable file and before saving the final output, you may consider setting PDF compliance to PDF/A format. For further details, please visit

I have also another option.please suggest how can I work in this way…
I have folder of Jpe2 Image
need to generate ocr Searchable pdf with existing image
after that we convert PDF to PDF/A.
All above mentioned work need to complete through asp.net mvc…

My requirement is Image has watermark…and pdf generate same as image…

rajeev-2:
I have also another option.please suggest how can I work in this way.. I have folder of Jpe2 Image need to generate ocr Searchable pdf with existing image after that we convert PDF to PDF/A. All above mentioned work need to complete through asp.net mvc..
Hi Rajeev,

You can use earlier stated solution to accomplish above stated requirement. Please note that our API's are built on top of .NET Framework, so they should work in any .NET based application.

praveen@rechnerinfo.com:
My requirement is Image has watermark..and pdf generate same as image..
Hi Praveen,

In case text stamp is added to PDF file, it will be recognized during OCR process but as shared earlier, any custom formatting of text will be lost as the API with recognize contents as plain text. In case you face any issue while using our API's, please share your resource files, so that we can further investigate the scenario in our environment.