Hello,
I’m trying to use the OCR apis and I successfully obtained some text from a scanned PDF on Windows.
When I launch the same operation on a Unix Centos 7 environment, the apis return empty text.
Is there any issue about installed fonts?
What would you suggest to solve this problem?
Thank you.
@rmarelli65
If possible, could you please share your sample file and the code snippet that you have used? We will investigate by logging an investigation ticket in our issue management system and share the ID with you.
import java.awt.Rectangle;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.log4j.Logger;
import com.aspose.ocr.AsposeOCR;
import com.aspose.ocr.DetectAreasMode;
import com.aspose.ocr.InputType;
import com.aspose.ocr.License;
import com.aspose.ocr.OcrInput;
import com.aspose.ocr.PreprocessingFilter;
import com.aspose.ocr.RecognitionResult;
import com.aspose.ocr.RecognitionSettings;
public class AsposeOCRUtility {
public static Logger logger = Logger.getLogger(AsposeOCRUtility.class);
public static String ocr(File inFile, int startPage, int pagesCount) throws IOException {
logger.info("Started OCR on "+inFile.getName());
StringBuilder sb = new StringBuilder();
AsposeOCR api;
PreprocessingFilter filters;
RecognitionSettings recognitionSettings;
try {
com.aspose.pdf.License license = new com.aspose.pdf.License();
license.setLicense("... License file ...");
logger.info("License isValid: "+License.isValid());
// Create instance of OCR API
api = new AsposeOCR();
// Specify recognition settings
recognitionSettings = new RecognitionSettings();
//recognitionSettings.setAllowedCharacters(CharactersAllowedType.LATIN_ALPHABET);
recognitionSettings.setLanguage(com.aspose.ocr.Language.Eng);
recognitionSettings.setDetectAreasMode(DetectAreasMode.NONE);
recognitionSettings.setUpscaleSmallFont(true);
ArrayList<Rectangle> rectangles = new ArrayList<Rectangle>();
Rectangle rectangle = new Rectangle(10, 10, 1200, 400);
rectangles.add(rectangle);
recognitionSettings.setRecognitionAreas(rectangles);
recognitionSettings.setThreadsCount(10);
// Add images to the recognition batch
filters = new PreprocessingFilter();
OcrInput ocrInput = new OcrInput(InputType.PDF, filters);
ocrInput.add(inFile.getPath(), startPage, pagesCount);
// Recognize images
ArrayList<RecognitionResult> results = api.Recognize(ocrInput, recognitionSettings);
results.forEach((result) -> {
logger.info(result.recognitionText);
logger.info("----------------------------------------------------------");
});
if (!results.isEmpty()) {
return results.get(0).recognitionText;
}
} catch (Exception e) {
logger.error("Error in Aspose Recognize", e);
}
return sb.toString();
}
}
Calling AsposeOCRUtility.ocr( … ) on a file returns empty on Unix.
@rmarelli65
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): OCRJAVA-387
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.