How to copy hidden text from one PDF to another

jheight · November 26, 2008, 4:58pm

Hello,

We are looking at a product to batch OCR PDF files. Unfortunately as part of the OCR process the original PDF file (which may be a vector image) is rasterized. This is fine for the OCR process, to generate the text, but i would like to preserve the original vector PDF.

Therefore i woudl like to copy the hidden text from the OCRd “raster” PDF back over the top of the original “vector” PDF.

I believe that i need to use FormattedText objects but I cannot find a way in the API to extract a list of FormattedText objects from an existing PDF document.

Is this possible in the API?

Thanks

Jason

codewarior · November 27, 2008, 3:23am

Hello Jason,

Thanks for considering Aspose.

I am sorry to inform you that, extraction of contents from OCRd Pdf, is not yet supported and we cannot support it in short time.

We apologize for you inconvenience.