Please do you have a full worked example to convert a pdf to text using the latest version of Aspose OCR for java? The below code (using Apache pdfbox) used to work but having converted to the latest release 20.11, each line is cut off at the end, ie each line of text generated after ocr is missing the last 10 or so characters…
PDDocument pdDoc = PDDocument.load(p_stream);
PDFRenderer pdfRenderer = new PDFRenderer(pdDoc);
for(int i=0; i<pdDoc.getPages().getCount(); i++)
AsposeOCR ocr = new AsposeOCR();
RecognitionSettings settings = new RecognitionSettings();
RecognitionResult result = ocr.RecognizePage(pdfRenderer.renderImageWithDPI(i, 300), settings);
for (String text: result.recognitionAreasText) {
t_text.append(text);
t_text.append(p_rq.t.linefeed);
}
}
pdDoc.close();
Would you please provide your sample image(s) for our reference along with the screenshot of the extracted text by the API? We will test the scenario in our environment and address it accordingly.
We were able to reproduce the issue in our environment and have logged it as OCRJAVA-99 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.
We are pleased to inform you that the earlier logged issue has been resolved. Please use Aspose.OCR for Java 21.1 and let us know in case you face any issue.