Converting PDF to tiff, then trying to OCR it fails with "Unknown image type 0"

brissonp · February 19, 2024, 9:35pm

Hi, I am converting a PDF to a tiff to be able to perform OCR text extraction.
Aspose seems to convert properly using the TiffDevice approach
But when I try to process this image through the Aspose OCR
asposeOCR.Recognize
This call returns
java.lang.IllegalArgumentException: Unknown image type 0
at java.desktop/java.awt.image.BufferedImage.(BufferedImage.java:501)
at com.aspose.ocr.j.e0cd0c6d14(Unknown Source)
at com.aspose.ocr.j.f(Unknown Source)
at com.aspose.ocr.x.f(Unknown Source)
at com.aspose.ocr.y.f(Unknown Source)
…

Any help would be appreciated.

amjad.sahi · February 20, 2024, 8:39am

@brissonp,

Could you please zip and attach PDF document and TIFF image for reference. Also, share your complete sample code using Aspose (Java) APIs to reproduce the issue on our end. We will check your issue and assist you accordingly.

brissonp · February 20, 2024, 2:35pm

Hi,

Attached is a sample pdf, the tiff converted and the sample code.

dummy.pdf (13 KB)

(Attachment dummy.tiff is missing)

brissonp · February 20, 2024, 4:35pm

I am trying to resend again as the first email did not reach you. The tiff and pdf are in the included zip

dummy.zip (16.6 KB)

amjad.sahi · February 20, 2024, 5:26pm

@brissonp,

Thanks for the PDF document and TIFF image.

I checked and it seems output TIFF is Ok, so the error might be on Aspose.OCR end. I am moving your thread to respective forum where one of fellow colleagues from Aspose.OCR team will evaluate your issue and assist you accordingly.

asad.ali · February 20, 2024, 11:18pm

@brissonp

Would you please share which code snippet are you using to perform OCR on .tiff? We will test the scenario in our environment and address it accordingly.

brissonp · February 21, 2024, 3:12pm

Here it is:

try {
String inputFile = “path/dummy.pdf”;
byte[] imageBytes = convertPDFtoTiffAllPages(inputFile);

AsposeOCR ocrApi = new AsposeOCR()

StringBuilder sb = new StringBuilder();

// Initialize OCR engine

InputStream is = new ByteArrayInputStream(imageBytes);

OcrInput input = new OcrInput(com.aspose.ocr.InputType.SingleImage);

input.add(is);

RecognitionSettings settings = new RecognitionSettings();

settings.setLanguage(Language.Eng);

ArrayList result = ocrApi.Recognize(input, settings);

result.forEach(line → sb.append(line.recognitionText).append(sb));

log.info(sb.toString());

} catch (Exception e) {

e.printStackTrace();

}

asad.ali · February 21, 2024, 10:59pm

@brissonp

Are you using the latest available version of the API? Also, please check the example below and try to use the new code given there. In case you still notice any issues, please let us know.

OCR Recognizing TIFF Images in Aspose.OCR for Java

brissonp · February 23, 2024, 4:05pm

I am using the lastest version of the API which is 24.2.0

The link to the code you provided refers to API version 23.7.1

asad.ali · February 24, 2024, 12:36am

@brissonp

We are sorry for the confusion. It looks like the example was never updated after the new release. Nevertheless, we have tested the scenario in our environment using the latest version and noticed that the API was generating similar exception.

Therefore, we have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRJAVA-360

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.