Scanned PDFs: Add text behind for searching

Hi,


I have an application that takes in scanned PDF documents. Each page has one image that represents the scanned page.

I would like to generate OCR text for each page and then place it behind the image, positionally accurate, so that later it can be loaded into a good PDF tool and searched on, highlighted and commented.

Can I do this easily with the Aspose tool set?

Thanks in advance.

Hi Paul,

Thanks for your request. I would like to share that currently Aspose.OCR API supports BMP and TIFF image formats with English, Spanish and French language, recognizes Arial, Times New Roman and Courier New Font with big sizes i.e. 16pts and above. Our development team is working over a major revamp of Aspose.OCR API for performance improvement, support of smaller font sizes, new fonts and languages.

In reference to your question, I’m afraid I couldn’t get your requirement properly; I will appreciate if you please elaborate it a bit more? So we suggest you accordingly.

As for as recognizing a character and placing it at same position of image, it could be achieved with combination of Aspose.OCR and Aspose.Imaging APIs. A rectangle is associated with each character recognized by OCR engine and we can get the coordinates from that rectangle to write the recognized character at specified coordinates of image using Aspose.Imaging API.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-29755) have been fixed in Aspose.Pdf for .NET 8.1.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(15)

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.