Text version display - PDF, Images & other scanned files

Hello Guys,

The licensed version of ASPOSE will support Scanned PDFs , images will display as text version files?

Thanks
San

@Santosh.gundla,

Thanks for contacting support.

From your above statement, do you mean that if you provide/load Scanned PDF document using Aspose.Pdf for .NET, the images added inside the document will appear as searchable content ?

If so is the case, then please note that you need to explicitly convert scanned PDF documents to searchable PDF files. Please take a look over following code snippet.

[C#]

 Document doc = new Document(@"C:\pdftest\Code\input.pdf");
doc.Convert(CallBackGetHocr);
doc.Save(@"C:\pdftest\Code\input_searchable.pdf");

static string CallBackGetHocr(System.Drawing.Image img)
{
    string dir = @"C:\pdftest\Code\";
    img.Save(dir + "ocrtest.jpg");
    ///V3.02
    System.Diagnostics.ProcessStartInfo info = new System.Diagnostics.ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
    info.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
    info.Arguments = @"C:\pdftest\Code\ocrtest.jpg C:\\pdftest\\Code\\out hocr";
    System.Diagnostics.Process p = new System.Diagnostics.Process();
    p.StartInfo = info;
    p.Start();
    p.WaitForExit();
    StreamReader streamReader = new StreamReader(@"C:\pdftest\Code\out.html");
    string text = streamReader.ReadToEnd();
    streamReader.Close();
    return text;
}

Thanks for your reply. I think for trial version, the getting text from PDF is 4 pages only.
In licensed copy, do we have any page limitations?

Like support i have one PDF have 100 pages, Can we read the text from all 100 pages of PDF file?

@Santosh.gundla,

When using the API in trial mode, there is a limitation of manipulating 4 elements of specific type in document. However in order to test the API without any limitations, please request a 30 days temporary license.

In case you still face any issue or you have any further query, please feel free to contact.