Converting Images/PDFs to searchable PDFs?

Hello,

we are in need to generate searchable PDFs from Images/PDFs. However it seems that Aspose lacks this kind of functionality - telling us to use the tesseract library from google. Surely that works, but since we are owning a Aspose Total license we wanted to ask if you are planning to support this kind of functionality in the future?
It would be great to have this common scenario handled with one of your libraries - so is there anything on the roadmap?

Kind regards,
Oliver

@multisupport,

Our product team is working on this feature. Work on hOCR for creation of searchable PDF is in progress. It is a very complex feature to develop. It will take time. At the moment, we are not in a position to share any reliable ETA, however, we will update you once our product team brings this feature on their roadmap. We are sorry for the inconvenience.

@ikram.haq
Some news about a release or pre- release date?
If a pre-relase is available it’s good for me for test prupose in a firste time.
It’s very important and urgent for me.

@mike1986,

As already shared that our product team is working on this feature. It is a very complex feature to develop. It will take time. At the moment, we are not in a position to share any reliable ETA, however, we will update you in this forum thread once any update is available.

Hi, we too are needing to create searchable and selectable text multipage PDF files from multipage tiffs. We need to process both text and image from the source file, so the output pdf looks exactly the same as the source tiff file.

I can see we can enumerate the source file pages and get hold of the text (value, positions etc), but we don’t want to do that. Ideally we would just make a one line call that takes the input file and spits out the pdf file as mentioned above. Is this possible yet?

@Gleedo,

Thank you for inquiry. This is to update you that Aspose.OCR API does not support searchable PDF. However, you can use a combination of Aspose.OCR for .NET and Aspose.PDF for .NET for this purpose. Aspose.OCR for .NET can be used to extract text from an image and Aspose.PDF for .NET can be used to insert image and extracted text (as selectable or selectable text) in the output PDF.