Text version display - PDF, Images & other scanned files


Hello Guys,

The licensed version of ASPOSE will support Scanned PDFs , images will display as text version files?




Thanks for contacting support.

From your above statement, do you mean that if you provide/load Scanned PDF document using Aspose.Pdf for .NET, the images added inside the document will appear as searchable content ?

If so is the case, then please note that you need to explicitly convert scanned PDF documents to searchable PDF files. Please take a look over following code snippet.


 Document doc = new Document(@"C:\pdftest\Code\input.pdf");

static string CallBackGetHocr(System.Drawing.Image img)
    string dir = @"C:\pdftest\Code\";
    img.Save(dir + "ocrtest.jpg");
    System.Diagnostics.ProcessStartInfo info = new System.Diagnostics.ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
    info.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
    info.Arguments = @"C:\pdftest\Code\ocrtest.jpg C:\\pdftest\\Code\\out hocr";
    System.Diagnostics.Process p = new System.Diagnostics.Process();
    p.StartInfo = info;
    StreamReader streamReader = new StreamReader(@"C:\pdftest\Code\out.html");
    string text = streamReader.ReadToEnd();
    return text;


Thanks for your reply. I think for trial version, the getting text from PDF is 4 pages only.
In licensed copy, do we have any page limitations?

Like support i have one PDF have 100 pages, Can we read the text from all 100 pages of PDF file?



When using the API in trial mode, there is a limitation of manipulating 4 elements of specific type in document. However in order to test the API without any limitations, please request a 30 days temporary license.

In case you still face any issue or you have any further query, please feel free to contact.