Aspose.Pdf.Kit.PdfExtractor not tiiming out out when extracting text from invalid pdf

Hello,

I am using Aspose.Pdf.Kit to extract text from pdf files. Some of these files are not actual pdfs at all (eg they are text files renamed as pdf etc). For these files, the call to extractor.ExtractText() is not throwing an exception or not timing out.

Aspose.Pdf.Kit.License l = new Aspose.Pdf.Kit.License();
l.SetLicense(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + @"\Aspose.Total.lic");
Aspose.Pdf.Kit.PdfExtractor extractor = new Aspose.Pdf.Kit.PdfExtractor();
extractor.BindPdf("test.pdf");
extractor.ExtractText();
string tmpFilename = Path.GetTempFileName();
File.Delete(tmpFilename);
extractor.GetText(tmpFilename);

I am using Aspose.Pdf.Kit 3.4.4.1 for .NET. Would it be possible to get a fix for this, or alternatively to be able to set a timeout on the PdfExtractor. Please see attached invalid pdf.

Hi James,

I have tested the file with Aspose.Pdf.Kit 3.5.0.0 and found that the ExtractText is throwing an exception after about 2 seconds with the following message:

Wrong text extracting, please check your pdf.: v3.5.0.0

Kindly, try Aspose.Pdf.Kit 3.5.0.0 and if problem persists please do let us know.

We're sorry for the inconvenience.

Regards,

Thanks, version 3.5.0.0 fixed the issue.

Hi,


i am using Aspose.Pdf.Kit 3.5.0.0, still i am getting the error

Wrong
text extracting, please check your pdf.: v3.5.0.0

Thanks
lakshmi Narasimhulu

Hi Lakshmi,


Thanks for contacting support and sorry for the delayed response.

Can you please share the source PDF file and the code snippet that you are using so that we can test the scenario at our end. We are really sorry for this inconvenienceā€¦


PS, Since July-2011, Aspose.Pdf.Kit for .NET has been merged into Aspose.Pdf for .NET and all the classes and enumerations of legacy Aspose.Pdf.Kit for .NET have been moved under Aspose.Pdf.Facades namespace of Aspose.Pdf for .NET. The version of component which you are using is quite old so I would suggest you to please try using the latest release of Aspose.Pdf for .NET 8.2.0. In order to extract, you may consider following the instructions specified over Extract Text from all the Pages using Text Device

Hi Nayyer Shahbaz,

I am using Aspose.Pdf.Kit 3.5.0.0

I am getting following error
Wrong text extracting, please check your pdf.: v3.5.0.0
My OS is 64 bit
Can you please help to resolve this issue
Best Regards,
khaleel karnal

Hi Khaleel,


Thanks for your inquiry. The subjected exception raised in case of invalid PDF file. Can you please share your sample PDF document and code snippet here? So we look into the issue and provide you more information accordingly.

Moreover, As mentioned above Aspose.Pdf.Kit for .NET has been merged into Aspose.Pdf for .NET and all the classes and enumerations of legacy Aspose.Pdf.Kit for .NET have been moved under Aspose.Pdf.Facades namespace of Aspose.Pdf for .NET. You are using a quite older version, So we strongly recommend you to download and try latest version of Aspose.Pdf for .NET from download area and you may consider following article for migration.



We are sorry for the inconvenience caused.

Best Regards,