Text extraction is slow and consumes a lot of memory

m.pfeifer · June 15, 2012, 3:38am

Hi,

i’m using Aspose.Pdf.dll 7.0.0.0 to extract text from the attached pdf document.

The extraction process delivers a correct result but it’s very slow and consumes a lot of memory. It seems that it doesn’t ignores the embeeded pictures which slows down the whole process.

I know the pdf document isn’t well build. It’s created from a GIS System and could be be constructed more efficient. But our customers will produce a lot of documents in this style in the near future.

So maybe you can tune the pdf extraction routine a little bit.

Best regards, Martin

nausherwan.aslam · June 15, 2012, 6:10am

Hi Martin,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the template file.

I have tested your scenario and you are right. It is taking some time to extract the text from the PDF file. I have registered an issue in our issue tracking system with issue id: PDFNEWNET-33809 for our development team to further check this issue. I will update you via this forum thread regarding the updates.

Sorry for the inconvenience,

gipasoft · October 23, 2015, 3:00am

Hi, I have the same problem do you have some news?

Regards

Giorgio

tilal.ahmad · October 26, 2015, 2:06am

Hi Giorgio,

Thanks for your interest in Aspose. Normally issues vary from file to file, we will appreciate it if you please share your sample code and file. We will look into it and guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

aspose.notifier · August 19, 2024, 2:55pm

The issues you have found earlier (filed as PDFNET-33809) have been fixed in Aspose.PDF for .NET 24.8.