OCR of TIFF image to Searchable PDF

nitinupasani · August 3, 2022, 10:04pm

Hello,

We have downloaded Aspose.Ocr.Java 22.6 version and attempting to OCR TIFF images and convert searchable into PDF. If this small POC works, we have a custom who is will to purchase licenses and paid support.

Can you please provide me some samples as how we can convert?

-Nitin

asad.ali · August 4, 2022, 10:53am

@nitinupasani

Please check the following documentation article(s) for recognizing TIFF images:

Recognize TIFF images

Once you get the results of OCR, you can save them into a file e.g. PDF:

Get OCR Result as file

rbhoriwal · August 4, 2022, 3:53pm

We are getting following exception.

Exception in thread “main” com.aspose.ocr.tiff.e0cd0c6d55: Sample bit-width of 1 is not supported
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c1d77(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c6d66(Unknown Source)
at com.aspose.ocr.tiff.TiffReader.read(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizeTiff(Unknown Source)
at com.aspose.ocr.examples.OcrFeatures.OCRRecognizeTiff.main(OCRRecognizeTiff.java:35)
Screenshot.png (43.8 KB)

asad.ali · August 4, 2022, 11:02pm

@rbhoriwal

Are you using the same files shared in this post?

nitinupasani · August 4, 2022, 11:37pm

yes, we both work on same issue and have 2 cases opened. If you want, please combine to one.

nitinupasani · August 3, 2022, 10:16pm

We are trying to execute sample code provided to OCR PDF and Tiff with Aspose.Ocr.Java 22.6 version. We are getting following errors:

OCRRecognizePdf.java

Exception in thread “main” java.lang.NoClassDefFoundError: ai/onnxruntime/OrtEnvironment
at com.aspose.ocr.e0cd0c6d14.f(Unknown Source)
at com.aspose.ocr.e0cd0c7d77.ac8a(Unknown Source)
at com.aspose.ocr.e0cd0c7d77.f(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizePage(Unknown Source)
at com.aspose.ocr.pdf.AsposeOCRPdf.RecognizePdf(Unknown Source)
at OCRRecognizePdf.main(OCRRecognizePdf.java:24)
Caused by: java.lang.ClassNotFoundException: ai.onnxruntime.OrtEnvironment
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

OCRRecognizeTiff.java

Exception in thread “main” com.aspose.ocr.tiff.e0cd0c6d55: Sample bit-width of 1 is not supported
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c1d77(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.f(Unknown Source)
at com.aspose.ocr.tiff.e0cd0c6d18.e0cd0c6d66(Unknown Source)
at com.aspose.ocr.tiff.TiffReader.read(Unknown Source)
at com.aspose.ocr.AsposeOCR.RecognizeTiff(Unknown Source)
at OCRRecognizeTiff.main(OCRRecognizeTiff.java:29)

asad.ali · August 4, 2022, 10:56am

@nitinupasani

Could you please confirm that onnxruntime library is installed and referenced properly in your project? Aspose.OCR has a dependency on it. Also, please share the sample TIFF and PDF files for our reference if you are still unable to produce any results. We will test the scenario in our environment and address it accordingly.

nitinupasani · August 4, 2022, 12:01pm

Hello Asad,

From where I can get onnxruntime library? Also sample TIFF and PDF images are uplaoded.

-Nitin

nitinupasani · August 4, 2022, 12:03pm

sample_images.zip (299.5 KB)

nitinupasani · August 4, 2022, 10:48pm

Hello Asad,

I have uploaded sample image files in zip folder. What we need help to take these samples and OCR and convert into searchable PDF.

If you can do that and tell us how it was achieved will be great. We have a customer lined-up and could purchase license if searchable PDF shows results they will be interested to see.

-Nitin

asad.ali · August 4, 2022, 11:00pm

@nitinupasani

We are testing the case in our environment and will get back to you shortly.

nitinupasani · August 8, 2022, 1:04pm

Hello Asad,

any update?

-Nitin

asad.ali · August 8, 2022, 8:55pm

@rbhoriwal, @nitinupasani

We have tested the scenario in our environment using your both files i.e. TIFF image and the PDF.

We were able to notice this exception while processing the TIFF image. Therefore, an issue as OCRJAVA-265 has been generated in our issue tracking system for further investigation.

Whereas, while recognizing your scanned PDF document, we noticed that the API generated a PDF with black image as output. This issue has been logged as OCRJAVA-266 in our issue tracking system.

We will look into details of the logged tickets and let you know as soon as they are resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

nitinupasani · August 8, 2022, 9:34pm

Hello Asad,

Thanks for update. Please note that we have a customer lined up and eagerly waiting for a solution we propose. We trust Aspose.

-Nitin

asad.ali · August 8, 2022, 11:01pm

@nitinupasani

We have recorded your concerns under the logged tickets and will surely consider them during investigation. We will let you know as soon as we have some updates. Please spare us little time.

We are sorry for the inconvenience.

asad.ali · August 15, 2022, 6:13pm

@nitinupasani

We have added the CCITT (Huffman) Encoding type support and it will be available in release version 22.8 (August 2022).