Hello,
We have a license for Aspose.Words, Aspose.pdf and Aspose.cells and evaluating the aspose.ocr product for .net.
The results are always wrong. Basically we are converting any document type to pdf and then to image to apply OCR but output is always incorrect.
here is the code I am using.
var pdfDocument = new Aspose.Pdf.Document(outputStream);
var ocrEngine = new Aspose.OCR.OcrEngine();
var sb = new StringBuilder();
for (int pageCnt =1;pageCnt<=pdfDocument.Pages.Count;pageCnt++)
{
using (var imageStream = new FileStream(Path.Combine(_targetFolder, "image_" + pageCnt.ToString()+".jpg"),FileMode.Create))
{
var resolution = new Aspose.Pdf.Devices.Resolution(300);
var jpegDevice = new Aspose.Pdf.Devices.JpegDevice(Convert.ToInt32(pdfDocument.Pages[pageCnt].PageInfo.Width),
Convert.ToInt32(pdfDocument.Pages[pageCnt].PageInfo.Height),
resolution, 100);
jpegDevice.Process(pdfDocument.Pages[pageCnt], imageStream);
imageStream.Close();
ocrEngine.Image = Aspose.OCR.ImageStream.FromFile(Path.Combine(_targetFolder, "image_" + pageCnt.ToString()+".jpg"));
if (ocrEngine.Process())
{
sb.Append(ocrEngine.Text);
sb.Append(Environment.NewLine);
}
}
}
output: {t'G
GilGlGl,‘GGGliGGGil’'FG;`G’G,Gl nn’GGG’iln’nG<Rest of the text is trimmed due to evaluation restriction!>}
Can you tell me why this is not working.