Looking for OCR API

ChangShin · January 17, 2014, 6:17pm

I am looking for OCR API for Java.
Can you recommend the OCR API?

Thanks,
Chang Shin

babar.raza · January 19, 2014, 6:08am

Hi Chang,

Thank you for considering Aspose products.

Aspose.OCR for Java API currently supports Arial, Times New Roman, Courier New, Tahoma, Calibri & Verdana fonts in Regular, Bold & Italic styles. Supported languages are English, French, Spanish and Cyrillic, whereas supported image formats are BMP and TIF.

Before making a decision, I would suggest you to give the API a try at your end. Please download the latest version of Aspose.OCR for Java 1.7.0 from the download section. Performing the OCR operation can be achieved through below provided simple steps; whereas detailed article and source code snippets are available on this documentation link.

Create an instance of OcrEngine.
Set the path of the resource file or resource folder using the OcrEngine.setResource method.
Set the image file on which OCR is to be performed using the OcrEngine.setImage method.
Add language(s) using the OcrEngine.getLanguages().addLanguage() method.
Call the OcrEngine.process method to perform OCR on the whole image.
If OcrEngine.process method returns true, get the recognized text with the OcrEngine.getText property.

In case you face any difficulty, please feel free to write back anytime.

ChangShin · January 20, 2014, 12:39pm

I checked the doc url.
The example is not enough for me.
Could you email some code?
Some pdf files have no search function to find a keyword.
So, I want to add OCR to existing pdf file.

Thanks,
Chang Shin

ChangShin · January 20, 2014, 12:52pm

Can we use 1.5.0 resource file for the latest version 1.7.0?

Thanks,
Chang Shin

babar.raza · January 21, 2014, 2:55am

Hi Chang,

Thank you for writing back.

Please note, each version of Aspose.OCR for Java API uses a specific resource file therefore you may experience an exception or undesired results if you try to use resource file from any other version of the API.

Regarding your other requirement of extracting text from a PDF file, I am afraid, Aspose.OCR for Java API can only process images at the moment. You can use Aspose.Pdf for Java API to convert PDF files to images, and then process these images with Aspose.OCR for Java to extract text. Please check the below linked technical articles for converting PDF files to TIFF and BMP image formats.
https://reference.aspose.com/pdf/java/pdf-image-manipulation/convert-pdf-pages-to-tiff-image-using-java/
https://reference.aspose.com/pdf/java/pdf-image-manipulation/convert-pdf-pages-to-bmp-image-using-java/

Rest of the process (for performing OCR operation) remains the same as discussed in my previous reply. Please let us know if you need our further assistance in this regard.

ChangShin · January 21, 2014, 11:29am

What do I create the PDF file after extracting text?

Thanks,
Chang Shin

babar.raza · January 22, 2014, 1:49am

Hi Chang,

I believe you meant to create/modify a PDF file with extracted text. You can achieve this by using Aspose.Pdf for Java API. Please check the below provided technical articles for your reference.

In case you face any difficulties in context of Aspose.Pdf for Java API, I would suggest you to post your question in Aspose.Pdf support forum. For any OCR/OMR related technical questions, please post here in Aspose.OCR support forum.

Thank you for your understanding and cooperation.