Free Support Forum - aspose.com

Perform OCR operation on an image using Aspose.OCR for Java - each line is cut off at the end

Please do you have a full worked example to convert a pdf to text using the latest version of Aspose OCR for java? The below code (using Apache pdfbox) used to work but having converted to the latest release 20.11, each line is cut off at the end, ie each line of text generated after ocr is missing the last 10 or so characters…

PDDocument pdDoc = PDDocument.load(p_stream);
PDFRenderer pdfRenderer = new PDFRenderer(pdDoc);
for(int i=0; i<pdDoc.getPages().getCount(); i++)
AsposeOCR ocr = new AsposeOCR();
RecognitionSettings settings = new RecognitionSettings();
RecognitionResult result = ocr.RecognizePage(pdfRenderer.renderImageWithDPI(i, 300), settings);
for (String text: result.recognitionAreasText) {
t_text.append(text);
t_text.append(p_rq.t.linefeed);
}
}
pdDoc.close();

@hansg

Would you please provide your sample image(s) for our reference along with the screenshot of the extracted text by the API? We will test the scenario in our environment and address it accordingly.

Can’t share the internal files we are working on but have attached a pdf downloaded from the web that has the same issue. Thanks for investigating!

ocr output.png (24.6 KB)
PublicWaterMassMailing.pdf (2.6 MB)

@hansg

We were able to reproduce the issue in our environment and have logged it as OCRJAVA-99 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.