Hi,
I have an image PDF file. Can I make that PDF searchable (image above text) using Aspose OCR products.
Regards,
Shama
Hi,
I have an image PDF file. Can I make that PDF searchable (image above text) using Aspose OCR products.
Regards,
Shama
You may perform OCR operation on PDF document by following the code snippet given in following article of API Documentation:
Hi,
Thank you for your response. I had already seen this example. This example does not make the PDF searchable. It just extracts the text from it. (Console.WriteLine(ocrEngine.Text)). Please share any example where PDF is made searchable.
Regards,
Shama
Regretfully, Aspose.OCR does not provide functionality to create searchable PDF documents. However, you can convert a non-searchable PDF into searchable PDF document by using following code snippet and Aspose.PDF for .NET.
private static void CreateSearchablePDF(string dataDir)
{
Document doc = new Document(@"C:\Users\Home\Downloads\test.pdf");
doc.Convert(CallBackGetHocr);
doc.Save("E:/Data/test_searchable.pdf");
}
static string CallBackGetHocr(System.Drawing.Image img)
{
string dir = @"E:\Data\";
img.Save(dir + "ocrtest.jpg");
///V3.02
System.Diagnostics.ProcessStartInfo info = new System.Diagnostics.ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
info.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
info.Arguments = @"E:\data\ocrtest.jpg E:\data\out hocr";
System.Diagnostics.Process p = new System.Diagnostics.Process();
p.StartInfo = info;
p.Start();
p.WaitForExit();
StreamReader streamReader = new StreamReader(@"E:\data\out.html");
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
Above logic recognizes text for PDF images. For recognition, you may use outer OCR that supports HOCR standard (http://en.wikipedia.org/wiki/HOCR ). We have used free google tesseract OCR in the above code snippet. Please install it into your computer from http://code.google.com/p/tesseract-ocr/downloads/list , after that you will have tesseract.exe console application.
Asad, does it mean on online APIs Aspose does not use Aspose.OCR exclusively but in combination with other third-party software?
We are collecting the information related to Aspose.OCR online App and will get back to you shortly.