Hi Team,
I have converted the tiff file to pdf using aspose.pdf dll but when i open a converted pdf and to find the some text using “ctrl+f” then it gives me a message of “text not found” though there is a text present in pdf file this means that the created pdf is not a searchable pdf. I want to convert the tiff file to pdf with ocr functionality so can i get a c# code to convert the tiff file to searchable pdf.
Also i have used the below code that was posted from aspose team related to searchable pdf but it can’t works for me.
C#]
public void Main<o:p></o:p>
{<o:p></o:p>
Document doc = new Document(“Input.pdf”);<o:p></o:p>
doc.Convert(CallBackGetHocr);<o:p></o:p>
doc.Save(“output.pdf”);<o:p></o:p>
}
private string CallBackGetHocr(System.Drawing.Image img)<o:p></o:p>
{<o:p></o:p>
string dir = @“c:\PdfTest”;<o:p></o:p>
img.Save(dir + “test.jpg”);<o:p></o:p>
ProcessStartInfo info = new ProcessStartInfo(@“tesseract”);<o:p></o:p>
info.WindowStyle= ProcessWindowStyle.Hidden;<o:p></o:p>
info.Arguments = @“c:\pdftest\test.jpg c:\pdftest\out hocr”;<o:p></o:p>
Process p = new Process();<o:p></o:p>
p.StartInfo = info;<o:p></o:p>
p.Start();<o:p></o:p>
p.WaitForExit();<o:p></o:p>
StreamReader streamReader = new StreamReader(@“c:\pdftest\out.html”);<o:p></o:p>
string text = streamReader.ReadToEnd();<o:p></o:p>
streamReader.Close();<o:p></o:p>
return text;<o:p></o:p>
}
I have downloaded the “tesseract” application also.
Thanks,
Atul kadam
Hi Atul,
Thanks for your feedback. I am afraid I am unable to find any issue in creating searchable PDF document from your shared TIFF image. Please find sample project for the purpose. Hopefully it will help you accomplish the task.
Moreover, I am afraid currently Aspose.OCR is not mature enough to serve the purpose. Our development team is working hard to improve Aspose.OCR. As soon as issues are fixed in Aspose.OCR, we will be able to create searchable PDF document independent of any third party tool. We are sorry for the inconvenience.
Best Regards,
Hello Tilal,
Thanks, your solution is working for me, But i will consider this as a work around because we are using a third party tool. I am still waiting for Aspose.OCR to fix the issue of searchable pdf so that we get rid of third party tool and directly us Aspose.OCR to convert tif to searchable pdf.
Thanks
Atul kadam
Hi Atul,
We have logged an enhancement ticket in issue tracking system of Aspose.OCR for .NET as OCR-33801 to perform OCR over TIFF or other image files and return HTML/XHTML result so that formatting of Image contents is preserved. Once the HTML/XHTML is generated you may either use Aspose.Pdf for .NET or Aspose.Words for .NET to convert HTML/XHTML file to PDF format. The respective team is working hard on supporting above stated feature and as soon as we have some definite news regarding its implementation, we will let you know. Please be patient and spare us little time.