Hello Guys,
The licensed version of ASPOSE will support Scanned PDFs , images will display as text version files?
Thanks
San
Hello Guys,
The licensed version of ASPOSE will support Scanned PDFs , images will display as text version files?
Thanks
San
Thanks for contacting support.
From your above statement, do you mean that if you provide/load Scanned PDF document using Aspose.Pdf for .NET, the images added inside the document will appear as searchable content ?
If so is the case, then please note that you need to explicitly convert scanned PDF documents to searchable PDF files. Please take a look over following code snippet.
[C#]
Document doc = new Document(@"C:\pdftest\Code\input.pdf");
doc.Convert(CallBackGetHocr);
doc.Save(@"C:\pdftest\Code\input_searchable.pdf");
static string CallBackGetHocr(System.Drawing.Image img)
{
string dir = @"C:\pdftest\Code\";
img.Save(dir + "ocrtest.jpg");
///V3.02
System.Diagnostics.ProcessStartInfo info = new System.Diagnostics.ProcessStartInfo(@"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe");
info.WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden;
info.Arguments = @"C:\pdftest\Code\ocrtest.jpg C:\\pdftest\\Code\\out hocr";
System.Diagnostics.Process p = new System.Diagnostics.Process();
p.StartInfo = info;
p.Start();
p.WaitForExit();
StreamReader streamReader = new StreamReader(@"C:\pdftest\Code\out.html");
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
Thanks for your reply. I think for trial version, the getting text from PDF is 4 pages only.
In licensed copy, do we have any page limitations?
Like support i have one PDF have 100 pages, Can we read the text from all 100 pages of PDF file?
When using the API in trial mode, there is a limitation of manipulating 4 elements of specific type in document. However in order to test the API without any limitations, please request a 30 days temporary license.
In case you still face any issue or you have any further query, please feel free to contact.