OCR of TIFF image to Searchable PDF

Hello,

We have downloaded Aspose.Ocr.Java 22.6 version and attempting to OCR TIFF images and convert searchable into PDF. If this small POC works, we have a custom who is will to purchase licenses and paid support.

Can you please provide me some samples as how we can convert?

-Nitin

@nitinupasani

Please check the following documentation article(s) for recognizing TIFF images:

Once you get the results of OCR, you can save them into a file e.g. PDF:

We are getting following exception.

Exception in thread “main” com.aspose.ocr.tiff.e0cd0c6d55: Sample bit-width of 1 is not supported
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c1d77(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c6d66(Unknown Source)
at com.aspose.ocr.tiff.TiffReader.read(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizeTiff(Unknown Source)
at com.aspose.ocr.examples.OcrFeatures.OCRRecognizeTiff.main(OCRRecognizeTiff.java:35)
Screenshot.png (43.8 KB)

@rbhoriwal

Are you using the same files shared in this post?

yes, we both work on same issue and have 2 cases opened. If you want, please combine to one.

We are trying to execute sample code provided to OCR PDF and Tiff with Aspose.Ocr.Java 22.6 version. We are getting following errors:

OCRRecognizePdf.java

Exception in thread “main” java.lang.NoClassDefFoundError: ai/onnxruntime/OrtEnvironment
at com.aspose.ocr.e0cd0c6d14.f(Unknown Source)
at com.aspose.ocr.e0cd0c7d77.ac8a(Unknown Source)
at com.aspose.ocr.e0cd0c7d77.f(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizePage(Unknown Source)
at com.aspose.ocr.pdf.AsposeOCRPdf.RecognizePdf(Unknown Source)
at OCRRecognizePdf.main(OCRRecognizePdf.java:24)
Caused by: java.lang.ClassNotFoundException: ai.onnxruntime.OrtEnvironment
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

OCRRecognizeTiff.java

Exception in thread “main” com.aspose.ocr.tiff.e0cd0c6d55: Sample bit-width of 1 is not supported
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c1d77(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c6d66(Unknown Source)
at com.aspose.ocr.tiff.TiffReader.read(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizeTiff(Unknown Source)
at OCRRecognizeTiff.main(OCRRecognizeTiff.java:29)

@nitinupasani

Could you please confirm that onnxruntime library is installed and referenced properly in your project? Aspose.OCR has a dependency on it. Also, please share the sample TIFF and PDF files for our reference if you are still unable to produce any results. We will test the scenario in our environment and address it accordingly.

Hello Asad,

From where I can get onnxruntime library? Also sample TIFF and PDF images are uplaoded.

-Nitin

sample_images.zip (299.5 KB)

Hello Asad,

I have uploaded sample image files in zip folder. What we need help to take these samples and OCR and convert into searchable PDF.

If you can do that and tell us how it was achieved will be great. We have a customer lined-up and could purchase license if searchable PDF shows results they will be interested to see.

-Nitin

@nitinupasani

We are testing the case in our environment and will get back to you shortly.

Hello Asad,

any update?

-Nitin

@rbhoriwal, @nitinupasani

We have tested the scenario in our environment using your both files i.e. TIFF image and the PDF.

We were able to notice this exception while processing the TIFF image. Therefore, an issue as OCRJAVA-265 has been generated in our issue tracking system for further investigation.

Whereas, while recognizing your scanned PDF document, we noticed that the API generated a PDF with black image as output. This issue has been logged as OCRJAVA-266 in our issue tracking system.

We will look into details of the logged tickets and let you know as soon as they are resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hello Asad,

Thanks for update. Please note that we have a customer lined up and eagerly waiting for a solution we propose. We trust Aspose.

-Nitin

@nitinupasani

We have recorded your concerns under the logged tickets and will surely consider them during investigation. We will let you know as soon as we have some updates. Please spare us little time.

We are sorry for the inconvenience.

@nitinupasani

We have added the CCITT (Huffman) Encoding type support and it will be available in release version 22.8 (August 2022).