Accuracy issue in performing OCR on italic fonts

sudesh · August 21, 2019, 11:06am

Hi, we are processing general italic fonts which Aspose supports in our images like Arial Italic and Times New Roman Italic. Below is the general code which extracts text from these images (attached to the topic)

Aspose.OCR.OcrEngine ocrEngine = new Aspose.OCR.OcrEngine();
ocrEngine.Image = ImageStream.FromFile(filePath);
if (ocrEngine.Process())
{
string imageText = ocrEngine.Text.ToString();
}

Both the documents are made in 300 dpi and have 12 font size
The output we get is very inaccurate. Is there a way at present to get a fair output? Please guide us

asad.ali · August 21, 2019, 10:42pm

@sudesh

Would you kindly share your sample images with us. We will test the scenario in our environment and address it accordingly.

sudesh · August 22, 2019, 6:49am

times it.zip (822.8 KB)
ai.zip (2.6 MB)
Here are the attachments for 2 images with Times new italic and Arial italic fonts.

asad.ali · August 22, 2019, 7:26pm

@sudesh

We have tried to extract the text in our environment and noticed that inaccuracy level was very low. Only some specific characters/symbols were wrongly identified. OCRed.png (22.6 KB) Would you kindly share a screenshot highlighting the error you are concerned with. We will further proceed to assist you accordingly.