We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Create PDF Doc from OCR Image

I can’t find anywhere in the documentation about this, so I don’t know if it’s available. This may also belong under OCR.


Is there a way to take an image, scanned as a PDF, and OCR it so that the text in the PDF is select-able?

Or can we scan an image as a BMP, and convert the image with text into a PDF?
Fortinbra:
I can't find anywhere in the documentation about this, so I don't know if it's available. This may also belong under OCR.

Is there a way to take an image, scanned as a PDF, and OCR it so that the text in the PDF is select-able?


Hi Darren,

Thanks for your interest in our products. Aspose.Pdf for .NET provides the capabilities to extract text from PDF document but if the source PDF file is generated from a scanned image, then I am afraid Aspose.Pdf for .NET might not be able to extract text from it. However for this particular requirement, first you need to extract the images from PDF document and then you may try using Aspose.OCR to extract text from image files. For more information, please visit the following links


Fortinbra:
Or can we scan an image as a BMP, and convert the image with text into a PDF?


In order to convert an image file into PDF format, please try using Aspose.Pdf for .NET. Please visit the following link for more information on How to - Convert an Image to PDF

In the event of any further query, please feel free to contact

Hi Darren,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for considering Aspose.

To further add to Nayyer’s reply, you may check the following blog post for using Aspose.Pdf for .NET together with Aspose.OCR for .NET to extract text from Pdf and Image documents and see if together these products can fit your requirements.

https://blog.aspose.com/2011/07/20/extract-text-from-pdf-including-images-combine-aspose.pdf-and-aspose.ocr

Thank You & Best Regards,

The two big portions of this are already in place, pulling images from PDF, and pulling text from images (ocr). What I’m needing is a way to add the text back to the PDF so that it lines up with the corresponding parts of the image, and be select-able by the end user.


We have existing software that does exactly this, OCRs the image in the PDF and places the text over the corresponding part of the image to make it appear that you are selecting text from the image. This software is part of a 3rd party dll. We’d like to move away from said dll, and use Apose for as much as we can.

Hi Darren,

As per the requirement that you have stated above, I think you can place the text on top of image in the form of watermark. Please check the following link for more details on
Adding Text Stamp in the PDF File.