Aspose OCR Text Extraction Issue


Hi Aspose,

I have an issue extracting text from the attached png image. The extracted is as below (you can have a look at the image to see the differences):

answer -> Adobe Acrobar PDF Files
Adobeig) Por-table Docyument Format (PDF) isa univer-sal tile tormat that pr-eservesall
o tythe tont s,torma tting,cyo lour-san dgr-aphicyso tyany sour-cye do cyum ent,r-egar-dl e sso ty
the applicyation andplattyorm usedto cyl-eate it.
Adobe PDF isan ide al tyormat tor-el ecytr-onicydo cyument distr-ibution asit over-cyome sthe
pr-obl e mschommonly encyounter-e dwith el e cytr-onicytil e sh ar-ing .
o AFIyoFzey ayzywleelee cyan open a PDF tile. All you needisthe tyl-ee Adobe Acml-obat
Reader-. Recyipientsotyother-tile tormatssometimeschan't open tilesbecyause they
don't have the applicyationsusedto cyl-eate the docyuments.
o PDF tyilesalwaysnpleirzt coleveectly on any pr-inting devicye.
Pdf_Text.png (56.2 KB)

The source code is as simple as below:

System.out.println("Start OCR test");
String imagePath = "D:\tmp\Tuyen\aspose\ocr\Pdf_Text.png";
// Create an instance of OcrEngine
OcrEngine ocr = new OcrEngine();
// Set image file
// Perform OCR and get extracted text
try {
if (ocr.process()) {
System.out.println("\ranswer -> " + ocr.getText());
} catch (Exception e) {

Can you please let me know what's the issue? If you're concern about the quality of attached png file, you can refer to the original attached pdf file too.pdf-sample.pdf (8.9 KB)

Thank you.




We have investigated the issue at our end. Initial investigation shows that the issue persists. The issue has been logged into our system with ID OCRJAVA-771 for further investigation. We'll update you here once there is some information or a fix version available in this regard.



Do you have any updates on this? Thank you.



We have asked our product team to provide update on this issue. We will share the information with you as soon as it is available.