Converting PDF to tiff, then trying to OCR it fails with "Unknown image type 0"

Hi, I am converting a PDF to a tiff to be able to perform OCR text extraction.
Aspose seems to convert properly using the TiffDevice approach
But when I try to process this image through the Aspose OCR
asposeOCR.Recognize
This call returns
java.lang.IllegalArgumentException: Unknown image type 0
at java.desktop/java.awt.image.BufferedImage.(BufferedImage.java:501)
at com.aspose.ocr.j.e0cd0c6d14(Unknown Source)
at com.aspose.ocr.j.f(Unknown Source)
at com.aspose.ocr.x.f(Unknown Source)
at com.aspose.ocr.y.f(Unknown Source)

Any help would be appreciated.

@brissonp,

Could you please zip and attach PDF document and TIFF image for reference. Also, share your complete sample code using Aspose (Java) APIs to reproduce the issue on our end. We will check your issue and assist you accordingly.

Hi,

Attached is a sample pdf, the tiff converted and the sample code.

dummy.pdf (13 KB)

(Attachment dummy.tiff is missing)

I am trying to resend again as the first email did not reach you. The tiff and pdf are in the included zip

dummy.zip (16.6 KB)

@brissonp,

Thanks for the PDF document and TIFF image.

I checked and it seems output TIFF is Ok, so the error might be on Aspose.OCR end. I am moving your thread to respective forum where one of fellow colleagues from Aspose.OCR team will evaluate your issue and assist you accordingly.

@brissonp

Would you please share which code snippet are you using to perform OCR on .tiff? We will test the scenario in our environment and address it accordingly.

Here it is:

try {
String inputFile = “path/dummy.pdf”;
byte[] imageBytes = convertPDFtoTiffAllPages(inputFile);

AsposeOCR ocrApi = new AsposeOCR()

StringBuilder sb = new StringBuilder();

// Initialize OCR engine

InputStream is = new ByteArrayInputStream(imageBytes);

OcrInput input = new OcrInput(com.aspose.ocr.InputType.SingleImage);

input.add(is);

RecognitionSettings settings = new RecognitionSettings();

settings.setLanguage(Language.Eng);

ArrayList result = ocrApi.Recognize(input, settings);

result.forEach(line → sb.append(line.recognitionText).append(sb));

log.info(sb.toString());

} catch (Exception e) {

e.printStackTrace();

}

@brissonp

Are you using the latest available version of the API? Also, please check the example below and try to use the new code given there. In case you still notice any issues, please let us know.

I am using the lastest version of the API which is 24.2.0

The link to the code you provided refers to API version 23.7.1

@brissonp

We are sorry for the confusion. It looks like the example was never updated after the new release. Nevertheless, we have tested the scenario in our environment using the latest version and noticed that the API was generating similar exception.

Therefore, we have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRJAVA-360

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.