No text being extracted from PDF document

jmurphy · June 29, 2009, 4:34am

Hello, I am using Aspose.Pdf.Kit.PdfExtractor version 2009.3.24 to extract text from the attached PDF document as follows:

Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();

extractor.BindPdf(filePath);

extractor.ExtractText();

string tmpFilename = Path.GetTempFileName();

extractor.GetText(tmpFilename);

StreamReader reader = new StreamReader(tmpFilename);

string text = reader.ReadToEnd());

reader.Close();

No text is being returned, but no errors are being thrown either. Is this document not in the required forrmat?

codewarior · June 29, 2009, 1:18pm

Hello James,

Thanks for considering Aspose.

I’ve tested the scenario using latest hotfix for Aspose.Pdf.Kit and I am unable to reproduce the problem. In fact the text is being extracted from the attached PDF document. Please try using the latest hotfix in attachment and in case you still face any problem, feel free to contact.

I’ve used the following code snippet and the text extracted from the PDF document is present in the file in attachment “Extracgted_Text.txt”.

[C#]

Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();
extractor.BindPdf(@"d:/pdftest/Lucene+Query+Parser+Syntax.pdf");
extractor.ExtractText();
string tmpFilename = @"d:/pdftest/Extracgted_Text.txt";
extractor.GetText(tmpFilename);

jmurphy · June 30, 2009, 2:46am

Hi, Yes the latest hotfix release fixed this. Thanks.