We are looking at a product to batch OCR PDF files. Unfortunately as part of the OCR process the original PDF file (which may be a vector image) is rasterized. This is fine for the OCR process, to generate the text, but i would like to preserve the original vector PDF.
Therefore i woudl like to copy the hidden text from the OCRd “raster” PDF back over the top of the original “vector” PDF.
I believe that i need to use FormattedText objects but I cannot find a way in the API to extract a list of FormattedText objects from an existing PDF document.
Is this possible in the API?
Thanks for considering Aspose.
I am sorry to inform you that, extraction of contents from OCRd Pdf, is not yet supported and we cannot support it in short time.
We apologize for you inconvenience.