Text extraction is slow and consumes a lot of memory


i’m using Aspose.Pdf.dll to extract text from the attached pdf document.

The extraction process delivers a correct result but it’s very slow and consumes a lot of memory. It seems that it doesn’t ignores the embeeded pictures which slows down the whole process.

I know the pdf document isn’t well build. It’s created from a GIS System and could be be constructed more efficient. But our customers will produce a lot of documents in this style in the near future.

So maybe you can tune the pdf extraction routine a little bit.

Best regards, Martin

Hi Martin,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for sharing the template file.

I have tested your scenario and you are right. It is taking some time to extract the text from the PDF file. I have registered an issue in our issue tracking system with issue id: PDFNEWNET-33809 for our development team to further check this issue. I will update you via this forum thread regarding the updates.

Sorry for the inconvenience,

Hi, I have the same problem do you have some news?


Hi Giorgio,

Thanks for your interest in Aspose. Normally issues vary from file to file, we will appreciate it if you please share your sample code and file. We will look into it and guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,