Poor recognition and verrrrrrry slow

Hello guys,

I am using the latest version 3.5.0 of the OCR .NET assembly. In the software that I am writing, I currently use Puma.NET as it works relatively fast and has quite reliable results most of the times. We bought Aspose .NET OCR version because Puma.NET is not supported and we sometimes have hard crashes with some documents.

Now, I have tried to implement the Aspose version since a few weeks now, but it does only recognize the easiest documents (Word docs saved as jpg for example), and it requires at least a minute per page, even for only a few lines of text, which is way too long. I probably have forgotten something but I load the proper language file and I get a result, so I'm quite puzzled.

Here is my code. Please assist and TIA! :)

PROCEDURE plf_Import_OCR()

blm_Result is boolean

sgf_File_OCR = CompleteDir(fExeDir()) + "Pain.jpg" // This is a plain Word document converted to JPG and containing 12 lines of text. It takes more than one minute to recognize and the recognition is 90% Ok

// is a commented line
//sgf_File_OCR = CompleteDir(fExeDir()) + "SpanishOCR.bmp"

IF NOT fFileExist(sgf_File_OCR) THEN
STOP
RETURN
END

ogf_Engine_OCR = new "Aspose.OCR".OcrEngine

ogf_Engine_OCR.Image = ImageStream.FromFile(sgf_File_OCR)

// Language
// Clear the default language (English)
ogf_Engine_OCR.LanguageContainer.Clear()
// Load the resources of the language from file path location or an instance of Stream
ogf_Engine_OCR.LanguageContainer.AddLanguage(LanguageFactory.Load("French_language_resource_file_for_Aspose.OCR_for_.NET_3.2.0.zip"))

// Added the following out of despair I guess...
ogf_Engine_OCR.Config.RemoveNonText=True
ogf_Engine_OCR.Config.DetectReadingOrder=True
ogf_Engine_OCR.Config.DetectTextRegions=True
ogf_Engine_OCR.Config.DoSpellingCorrection=True

// Filters (tried them but with no improvement so I "commented" them)
//ogf_Filters = new CorrectionFilters()

//blm_Filter1 is Medianfilter(5)
//ogf_filters.add(blm_Filter1)

//blm_Filter2 is GaussBlurFilter()
//ogf_Filters.add(blm_Filter2)

//blm_Filter3 is RemoveNoiseFilter()
//ogf_Filters.add(blm_Filter3)


blm_Result = ogf_Engine_OCR.Process()

IF NOT blm_Result THEN
STOP
RETURN
END

sgf_Text_OCR = ogf_Engine_OCR.Text.ToString()

Info("Lecture OCR : " + CR + sgf_Text_OCR)

ogf_Engine_OCR.Dispose()
Hi Benoit,

Thank you for your inquiry.

Please forward us the sample file that you have mentioned in the post (SpanishOCR.bmp). We will test it at our end and update you about our findings.

Hi!

I am trying to tweak the recognition process in order to adjust it to a very variable environment, where the input documents are not homogenous, sometimes old documents (architect plans for example), etc...

The document is not SpanishOcr.jpg since this was also a test that we tried and this file is in comments (//) in the code. For now, the real code that we are working with is attached:

How can we tweak it better if possible?

Thank you!

JD
















Hi Benoit,

Thank you for providing sample files.

We have tested the scenario using the files provided by you. It was found that the issue persists. The issue has been logged into our issue tracking system with ID OCR-36000. Our product team will look into it. We will update about the progress via this thread.

Furthermore you can try getting specific contents from a portion of the image, you can use the custom recognition blocks to get better accuracy.

Hi Benoit,

Thank you for your patience.

This is to update you that ticket ID OCR-36000 is not yet resolved. Our product team is working on French language support on a simple paper with black writing and white background. Estimated ETA for this feature is by the end of current year or early next year. Furthermore support for performing OCR operation on images like "Prix.jpg" will be through user-defined text blocks functionality only and we cannot share any ETA for this feature.

We are sorry for the inconvenience.

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.