Extract Text from a PDF file

MunendraKumar · October 22, 2008, 9:02am

Hi,

I am using ASPOSE.Total product.

I want to extract text from a PDF file. When I do so then it works only with searchable PDFs not with other types of PDF i.e. scanned PDFs.

I am using the code as follows:-

Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();

//extractor.BindPdf(@"C:\test PDF\SKMBT_C55008101620440.pdf");

extractor.BindPdf(@"C:\test PDF\check1.pdf");

extractor.ExtractText();

extractor.GetText(@"C:\abc.txt");

I am sending u two PDF files on which I text this code. For "check1.pdf" it works fine because it is a searchable text PDF but not for the "SKMBT_C55008101620440.pdf".

Please try to help me to sort this problem because all the PDF files I recieved for extracting text are of second PDF file tye from which my code couldn't extract the text.

Reply ASAP.

codewarior · October 22, 2008, 12:34pm

Hello Munendra,

I am sorry to inform you that extraction of text from image based pdf files is not yet supported. We have planned to support this feature, but I am afraid we cannot support it in short time.

We apologize for your inconvenience.