[BUG] Certain PDF files result in TextAbsorber getting trapped in an infinite Loop, eating up all the memory and finally crashing everything

We are using the TextAbsorber to extract text and index a large number of PDF files. There are a couple of files where everything will just hang up because TextAbsorber becomes stuck in an infinite loop.

Apparently it tries to read past the end of the stream, catches the exception an tries again - forever.

Since the exception is handled internally and never thrown I can’t handle it myself either.

The file itself seems to be fine, no PDF application has any complaints when opening it.

Can I somehow provide you with the PDF in question privately?

@WGAZTgmbh

It seems you are experiencing an issue with the TextAbsorber in Aspose.PDF, where it gets stuck in an infinite loop while processing certain PDF files. This can indeed be frustrating, especially when it leads to excessive memory consumption and crashes.

Unfortunately, the internal handling of exceptions in the TextAbsorber may prevent you from managing this situation directly in your code. However, here are a few suggestions that might help you troubleshoot or mitigate the issue:

  1. Update Aspose.PDF: Ensure that you are using the latest version of the Aspose.PDF library. Sometimes, bugs are fixed in newer releases, and updating might resolve the issue.

  2. Test with Different Options: If you haven’t already, try using different constructors of the TextFragmentAbsorber. For example, you can initialize it with specific text search options or edit options to see if that changes the behavior.

  3. Isolate the Problematic PDF: If possible, try to isolate the specific PDF files that cause the issue. This can help in reporting the bug to Aspose support, as they may need to analyze the file to understand why the TextAbsorber is failing.

  4. Contact Support: Since you mentioned the possibility of providing the PDF file privately, consider reaching out to Aspose’s support team directly. They may have more tools at their disposal to diagnose the issue and provide a fix or workaround.

  5. Review Documentation: Check the official documentation for any notes on known issues or limitations with the TextFragmentAbsorber. This might provide insights into how to avoid such situations.

If you continue to face this issue, I recommend sharing your findings with the Aspose community or support team, as they may have encountered similar reports and could offer a more tailored solution.

If you have any further questions or need additional assistance, feel free to ask!