I wanted to understand if the Aspose tool would meet the requirements of our team. I have attached them to this post. If anyone could help provide this information it would be greatly appreciated!
Hi Tyler,
Thanks for contacting support.
I have gone through your requirements list and specified below are details regarding these requirements.
I have used free Google tesseract OCR(http://en.wikipedia.org/wiki/Tesseract_(software)). Please install it on your computer from tesseract-ocr · GitHub and after that you will have tesseract.exe console application.
Below you can see usage example:
[C#]
private string CallBackGetHocr(System.Drawing.Image img)
{
string dir = @"c:\PdfTest";
img.Save(dir + "test.jpg");
ProcessStartInfo info = new ProcessStartInfo(@"tesseract");
info.WindowStyle = ProcessWindowStyle.Hidden;
info.Arguments = @"c:\pdftest\test.jpg c:\pdftest\out hocr";
Process p = new Process();
p.StartInfo = info;
p.Start();
p.WaitForExit();
StreamReader streamReader = new StreamReader(@"c:\pdftest\out.html");
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
public void Main(string[] args)
{
Document doc = new Document("Input.pdf");
doc.Convert(CallBackGetHocr);
doc.Save("output.pdf");
}