Aspose.PDF (19.7 for NET, and earlier) has problems with using TextAbsorber on some vector-based PDFs. I am seeing some serious memory leaks - running it on a 50 MB PDF uses 24 GB of RAM before failing. It it the same problem reported 4 years ago here: Out of Memory Exceptions
The vector PDFs can be very complex so I understand it might require a lot of memory, but how can we stop it using ALL the memory and causing out of memory? E.g.:
Can we detect if a page has vector images? This was requested a year ago at Check if PDF page have vector images - is there any update on that?
Can there be a timeout or interruption or memory limit to TextAbsorber, so if the timeout/limit is reached it can abort the text extraction?
Can there be an option to skip / ignore vector images when using TextAbsorber?
Is there an alternative way to TextAbsorber to extract all text from a PDF, that might not have the same problems?
Any information would be appreciated, thank you!