OCR problems

Hello Aspose Team,


I’m trying to extract text from a PDF but having no success whatsoever.

The resulted text is something like this:

C //][ ] ’ [ i`` ’ i [’’ ’ ‘’'' `-[]:[]]]]]];[][]][]][]]]]][]]+[]];;[]]]]-,/ []]+[]];;:[] ][]] [-, i
//…])…-…a…[…,…,…S/]…T/[:]::]:]:.[][]][][]]]]]][]]]]]]]//j/? /] [, [][]][][]]]]]][]]]]]]]//j/? ve
f][]][]]]]]]]]]]]]]]]]][]]][]]]]]]]]]]]]][]]]]]]]][]]]]]]]]]][]]]]]]][]]]]][]][]]]]]]]]j]]]]]]]]]]]]]]]]]]]]]]]]][]]]]]]}[][]][]][]]]]]]]]]]][]]]]]]][]]]]]W]]]]]]]][]]]][]]]]]]]]][\ []][]] ][I[[ l][]]]]]]]]]]][]]][]]]]]]]]]]]]][]]]]]]]][]]]]]]]]]][]]]]]]][]]]]][]][]]]]]]]]j]]]]]]]]]]]]]]]]]]]]]]]]][]]]]]]}[][]][]][]]]]]]]]][ Ali][]]]][]I[ ilifil []]]]]IIIIIIIIIIIIIIIIIIIII []]]]] /W]]]]]]]]*[ []]] [] []]][]]]]]]]]]]][]]] / i


Attached is the file being used. And the code below:

// Create an instance of Document to load the PDF
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(dataDir + “norma.pdf”);

// Iterate over the pages of PDF
for (int pageCount = 1; pageCount <= pdfDocument.getPages().size(); pageCount++) {
// Create Resolution object with DPI value
com.aspose.pdf.devices.Resolution resolution = new com.aspose.pdf.devices.Resolution(300);

// Create JPEG device with specified attributes (Width, Height, Resolution, Quality),
// where Quality [0-100], 100 is
// Maximum
com.aspose.pdf.devices.JpegDevice jpegDevice = new com.aspose.pdf.devices.JpegDevice(resolution, 100);

// Create stream object to save the output image
java.io.OutputStream imageStream = new java.io.FileOutputStream(
dataDir + “Converted_Image” + pageCount + “.jpg”);

// Convert a particular page and save the image to stream
jpegDevice.process(pdfDocument.getPages().get_Item(pageCount), imageStream);

// Set Image property of OcrEngine to the stream obtained from previous step
ocrEngine.setImage(ImageStream.fromFile(dataDir + “Converted_Image” + pageCount +
“.jpg”));

// Perform OCR operation on one page at a time
if (ocrEngine.process()) {
System.out.println(ocrEngine.getText());
}

// Close the stream
imageStream.close();
}
Hi Mario,

Thank you for your inquiry and sharing sample.

This is to update you that we have investigated the issue at our end. Initial investigation shows that the issue persists. The issue has been logged into our system with ID OCRJAVA-738. Our product team will further look into it and provide feedback. We will update you with the feedback in this thread once available.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.