We have an Aspose Total license (license: Order Date: 4/16/13, SHI Order: Y4JTU, PO#: PO00081536, User Name: xeroxsoft) for our project at Buck Consultants and we have a requirement to implement Aspose.Ocr in real time scenario.
We tried to implement "Extract text from images" of OCR for .NET with a simple image, but the extract turned out to be junk data .
A sample of the extract is given below.
-*^*^*a - ewe* \\\\\\\\\\\\\j]][\j[jj)J[(})]]7i}rj([}}(j{[[{])[]jj[v)?]])]\)j}[5{jj)[(jj/()$)]\\[jj}[?)}(j]\j]7(([\]\{{]j([(}()]{(/}jj[j7ujjw}[j]+jj)}u=IjL!([{5#)J>jv)[/{($$j{((?]j}\jjj{i/0}{{(j({i(5jf{[w}!)[\f))(j}!ij\(/L71!T[oll!f\\f\]!}{L{/-~{{7<-~\\\\\\j[\({)}!\[-]jiif\[(j{]{({{)(\]]]t)$)][]j{@(jj(([{\jj/\![?j\]\!]})[j{]j){[}#\\+j/)}j}4{-!j]j]]{-][]})j{[((+\jl][[j))Tj\{)(f)][/j}]{{c]1(]![{]])}{{j)!]]L{#!))7j){!5]4!(\{j-jjl\(i)[(l/]\}))}{))]j})i)\ij]?j}-[!}>!<[4-[5}}}ciIn-\{[<^~<\+\--\\\ rob - - c - - - ii - - i - t - - - - ii - - - - -
We can’t upload the original file due to security reasons, but we have tried this feature with various simple image files and everything so far has returned just junk text.
We can’t go back to the client about the limitations of the engine wrt to the font size it expects, moreover, it takes a long time to extract larger images files.
There are couple of freely available OCR engines with better accuracy which works for smaller font size with lesser execution time. (eg:-tesseract)
Is there any chance of an update or fix sooner to get us out of this situation?
Hi Don,
Thanks for your inquiry. Our development team is already working over the issue of smaller font recognition, OCR-29048. I’ve shared your concerns with the development team and also requested an update on the issue. I’ll update you as soon as I get a feedback from the team.
Sorry for the inconvenience faced.
Best Regards,
Can we have an update of the same?
Regards
Don Thottakath
Hi Don,
Sorry for the inconvenience faced. I’m afraid, its still not resolved. Our development team is looking into the issue and due to the complexity of the issue we can’t share any ETA at the moment. However we will notify you via this forum thread as soon as it is fixed.
Thanks for your patience and cooperation.
Best Regards,
Hi Ahmad,
Is the issue got resolved?
Regards
Don Thottakath
Hi Don,
Sorry for the inconvenience faced. I’m afraid we don’t have any significant success in resolving the issue. Currently our development team is working hard in investigating some algorithms to support small fonts and to improve product performance. We will update you as soon as we get some substantial improvement.
Thanks for your patience and cooperation.
Best Regards,
Hi Tilal,
Is there an update yet? I have a few million TIFs I need to process and am interested to see what aspose.ocr makes of them. We have a document management system that OCRs based on scansoft dlls which is ok but I want to create my own more efficient database. I have tried a couple of samples using v 1.5.0.0 and the process is very slow but I did get it to recognise some of the text, unfortunately not enough to be useful. Any update would be appreciated.
Cheers,
John.
Hi John,
Thanks for your interest in Apose.OCR.
I’m afraid we still not succeeded to resolve recognition accuracy of small fonts and performance issue. We are working hard to fix the issues and for the purpose we have involved some more resources. We will update you as soon as we get some significant improvement.
Thanks for your patience and cooperation.
Best Regards,
The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.