Extract text from PDF in C# using Aspose.PDF - TextAbsorber uses 100% processor and memory

Hello,

The following code with attached PDF causes 100% processor and runaway memory usage with Aspose.PDF for .NET 19.11, and never finishes (left it for 30 minutes): input.pdf (7.0 MB)

 var pdf = new Aspose.Pdf.Document("input.pdf");
 var absorber = new Aspose.Pdf.Text.TextAbsorber();
 pdf.Pages.Accept(absorber);

The PDF is actually generated by Aspose.CAD from a DWG, and seems to be a “bad” PDF. But it should not cause the problem in Aspose.PDF - Aspose.PDF.

Can you please advise if possible to fix, or how to detect before calling Accept so we can avoid?

Thank you

@ast3

Thank you for contacting support.

We have been able to reproduce the issue in our environment. A ticket with ID PDFNET-47312 has been logged in our issue management system for further investigations. We will let you know once any update will be available in this regard.

We are sorry for the inconvenience.

Hello, is there any update on this?

@ast3

Regretfully the issue is not yet resolved due to other high priority issues and implementations in the queue. The issue was logged under free support model and will surely be resolved on a first come first serve basis. Furthermore, the performance-related issues are complex in nature and require a certain amount of time to get fixed. We will surely inform you as soon as we have some updates regarding its rectification. Please spare us some time.

We are sorry for the inconvenience.

Hello, it is now almost a year since reporting this confirmed serious problem in Aspose.PDF.

It still happens with latest version, and causes the whole system to hang, which surely is a high priority issue.

If this will not or cannot be fixed soon, or at least a timeout/interrupt option (also long awaited), could you please advise as it means an alternative solution will unfortunately be required.

Thank you

@ast3

We are afraid that we are not in a position to share any alternative solution to the issue as its investigation is not yet completed. However, we have raised the issue priority to escalate analysis process and will inform you as soon as we have some news about its fix. We highly appreciate your patience and comprehension in this regard. Please give us some time.

We apologize for your inconvenience.

It is now 18 months since you “raised the issue priority” of this bug which hangs the library. It is still occurring with the latest version of Aspose.PDF for .NET :frowning:

Is this bug unfixable?

@ast3

Please accept our humble apology for the inconvenience and the delay you have been facing. Please note that we have not been able to fix the issue yet due to its complexity in nature. That is why you may experience it in the latest version as well. Nevertheless, we have already recorded your concerns to raise the issue priority to the next level and will let you know as soon as we have news about its fix ETA. We apologize for the inconvenience.

Hello,

This bug is still happening with the latest version of the library (22.12). Is there any ETA for a fix or workaround? Or at least, a way to manually interrupt the TextAbsorber?

Thank you

@ast3

We are afraid that the earlier logged ticket could not get resolved. The concerns have already been logged under the ticket and we will let you know once there are some updates. We humbly apologize for the inconvenience and the delay.