We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Convert scanned pdf to editable pdf in aspose.pdf

How can I convert scanned pdf to editable pdf in aspose.pdf?


You can convert a non-searchable PDF into searchable PDF document by using following code snippet and Aspose.PDF for .NET.

private static void CreateSearchablePDF(string dataDir)
 Document doc = new Document(@"C:\Users\Home\Downloads\test.pdf");

static string CallBackGetHocr(System.Drawing.Image img)
 string dir = @"E:\Data\";
 img.Save(dir + "ocrtest.jpg");
 System.Diagnostics.ProcessStartInfo info = new System.Diagnostics.ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
 info.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
 info.Arguments = @"E:\data\ocrtest.jpg E:\data\out hocr";
 System.Diagnostics.Process p = new System.Diagnostics.Process();
 p.StartInfo = info;
 StreamReader streamReader = new StreamReader(@"E:\data\out.html");
 string text = streamReader.ReadToEnd();
 return text;

Above logic recognizes text for PDF images. For recognition, you may use outer OCR that supports HOCR standard (http://en.wikipedia.org/wiki/HOCR ). We have used free google tesseract OCR in the above code snippet. Please install it into your computer from http://code.google.com/p/tesseract-ocr/downloads/list , after that you will have tesseract.exe console application.