Perform OCR operation on an image using Aspose.OCR for Java - each line is cut off at the end

hansg · December 22, 2020, 4:13pm

Please do you have a full worked example to convert a pdf to text using the latest version of Aspose OCR for java? The below code (using Apache pdfbox) used to work but having converted to the latest release 20.11, each line is cut off at the end, ie each line of text generated after ocr is missing the last 10 or so characters…

PDDocument pdDoc = PDDocument.load(p_stream);
PDFRenderer pdfRenderer = new PDFRenderer(pdDoc);
for(int i=0; i<pdDoc.getPages().getCount(); i++)
AsposeOCR ocr = new AsposeOCR();
RecognitionSettings settings = new RecognitionSettings();
RecognitionResult result = ocr.RecognizePage(pdfRenderer.renderImageWithDPI(i, 300), settings);
for (String text: result.recognitionAreasText) {
t_text.append(text);
t_text.append(p_rq.t.linefeed);
}
}
pdDoc.close();

asad.ali · December 22, 2020, 11:08pm

@hansg

Would you please provide your sample image(s) for our reference along with the screenshot of the extracted text by the API? We will test the scenario in our environment and address it accordingly.

hansg · January 4, 2021, 11:19am

Can’t share the internal files we are working on but have attached a pdf downloaded from the web that has the same issue. Thanks for investigating!

ocr output.png (24.6 KB)
PublicWaterMassMailing.pdf (2.6 MB)

asad.ali · January 4, 2021, 10:00pm

@hansg

We were able to reproduce the issue in our environment and have logged it as OCRJAVA-99 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

hansg · February 4, 2021, 4:21pm

Hi there. Please can you confirm if this has been resolved and if so which release contains the fix?

asad.ali · February 5, 2021, 2:27am

@hansg

We are pleased to inform you that the earlier logged issue has been resolved. Please use Aspose.OCR for Java 21.1 and let us know in case you face any issue.

hansg · February 8, 2021, 5:24pm

appears to be working, thanks!