OCR text is not coming out correctly

svangeti · April 19, 2018, 7:57pm

Hello,
We have a license for Aspose.Words, Aspose.pdf and Aspose.cells and evaluating the aspose.ocr product for .net.
The results are always wrong. Basically we are converting any document type to pdf and then to image to apply OCR but output is always incorrect.

here is the code I am using.

var pdfDocument = new Aspose.Pdf.Document(outputStream);

                var ocrEngine = new Aspose.OCR.OcrEngine();
                var sb = new StringBuilder();
                for (int pageCnt =1;pageCnt<=pdfDocument.Pages.Count;pageCnt++)
                {
                    using (var imageStream = new FileStream(Path.Combine(_targetFolder, "image_" + pageCnt.ToString()+".jpg"),FileMode.Create))
                    {
                        var resolution = new Aspose.Pdf.Devices.Resolution(300);
                        var jpegDevice = new Aspose.Pdf.Devices.JpegDevice(Convert.ToInt32(pdfDocument.Pages[pageCnt].PageInfo.Width),
                            Convert.ToInt32(pdfDocument.Pages[pageCnt].PageInfo.Height),
                            resolution, 100);
                        jpegDevice.Process(pdfDocument.Pages[pageCnt], imageStream);
                        imageStream.Close();
                        ocrEngine.Image = Aspose.OCR.ImageStream.FromFile(Path.Combine(_targetFolder, "image_" + pageCnt.ToString()+".jpg"));
                        if (ocrEngine.Process())
                        {
                            sb.Append(ocrEngine.Text);
                            sb.Append(Environment.NewLine);
                        }
                    }
                }

output: {t'GGilGlGl,‘GGGliGGGil’'FG;`G’G,Gl nn’GGG’iln’nG<Rest of the text is trimmed due to evaluation restriction!>}

Can you tell me why this is not working.

svangeti · April 19, 2018, 9:19pm

image_1.jpg (232.5 KB)

ikram.haq · April 20, 2018, 7:58am

@svangeti,

Thank you for sharing sample with us. We have investigated the issue at our end. Initial investigation shows that the issue persists. The issue has been logged into our system with ID OCR-65 for further investigation. We will update you here once there is some information or a fix version available in this regard.

svangeti · May 3, 2018, 1:45pm

Hello,
Do you have any update on the issue? or timeline to fix the issue.

ikram.haq · May 3, 2018, 5:11pm

@svangeti,

This is to update you that this issue is pending for investigation. We will update you here once there is some information or a fix version available in this regard.

svangeti · May 3, 2018, 7:05pm

Hi, thanks for the response. Do you have a timeline when this can be done?

ikram.haq · May 4, 2018, 3:51am

@svangeti,

We are not in a position to share any reliable ETA for this issue as it is in the queue for investigation among other issue. We are sorry for the inconvenience.

svangeti · June 18, 2018, 9:36pm

Hello,
Do you guys have any update on this?

ikram.haq · June 19, 2018, 6:26am

@svangeti,

Currently no update is available on this issue. Development on Aspose.OCR for .NET is temporarily suspended. Further to update you that we are currently working on Aspose.OCR for Cloud. We are sorry for the inconvenience.