Interrupt not possible when using ImageAbsorber or TextAbsorber

GRein · August 6, 2022, 2:04pm

Hallo,
I am using Aspose PDF 22.1. I spent a lot of time to analyze some buggy behaviour in our application.
One daemon-thread in our application uses page.accept(ImageAbsorber) and also page.accept(TextAbsorber). I do this, to check any PDF before we do an OCR-processing.
When this preCheck-thread needs too much time, I try to interrupt it. It never interrupts, when page.accept-Method is called. The thread remains alive, and never stops. This means, that over a long period of time more and more threads exist, the CPU-usage reaches 100% and the application doesn’t react any more until it crashes. Here is an example of the stack-trace:
“Thread-176” #215 daemon prio=5 os_prio=0 tid=0x0000000064716800 nid=0xd30 runnable [0x000000007efee000]
java.lang.Thread.State: RUNNABLE
at com.aspose.pdf.internal.ms.System.Collections.Generic.lf.lI(Unknown Source)
at com.aspose.pdf.internal.ms.System.Collections.Generic.lf.set_Item(Unknown Source)
at com.aspose.pdf.internal.l2j.lI.lI(Unknown Source)
at com.aspose.pdf.internal.l2v.lh.lI(Unknown Source)
at com.aspose.pdf.internal.l7t.l1j.(Unknown Source)
at com.aspose.pdf.internal.l7h.lf.lI(Unknown Source)
at com.aspose.pdf.internal.l4v.lj.lI(Unknown Source)
at com.aspose.pdf.internal.l4v.lj.lI(Unknown Source)
at com.aspose.pdf.ImagePlacementAbsorber.lf(Unknown Source)
at com.aspose.pdf.ImagePlacementAbsorber.lI(Unknown Source)
at com.aspose.pdf.ImagePlacementAbsorber.visit(Unknown Source)
at com.aspose.pdf.Page.accept(Unknown Source)
at de.lorenz.convert.util.pdf.PDFPreCheckOCR.checkImagesAndText(PDFPreCheckOCR.java:227)
at de.lorenz.convert.util.pdf.PDFPreCheckOCR.checkPage(PDFPreCheckOCR.java:81)
at de.lorenz.convert.util.pdf.PDFPreCheckOCR.preCheckPDF(PDFPreCheckOCR.java:56)
at de.lorenz.convert.util.pdf.ExtendedPreCheckPDF.analyzeWithCancel(ExtendedPreCheckPDF.java:199)
at de.lorenz.convert.util.pdf.ExtendedPreCheckPDF.checkPDF(ExtendedPreCheckPDF.java:75)
at de.lorenz.convert.converter.finals.Converter_Final_OCR_Tess4J.convertFile(Converter_Final_OCR_Tess4J.java:129)
at de.lorenz.convert.application.Thread_CreateOCRPDF.doRun(Thread_CreateOCRPDF.java:47)
at de.lorenz.convert.application.NotifyingThread.run(NotifyingThread.java:94)
at java.lang.Thread.run(Thread.java:748)

The only way, to stop this thread is calling the (deprecated) Thread.stop(). After this, the thread is gone, but as expected some ressources are not freed.

I hope, you can fix this bug soon,
Kind regards, Gerd

tahir.manzoor · August 6, 2022, 6:53pm

@GRein

To ensure a timely and accurate response, please attach the following resources here for testing:

Your input PDF.
Please create a simple Java application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

GRein · August 17, 2022, 2:42pm

Hallo,
sorry, this is a very complicated code embedded and it would be heavy, to extract this in a runnable application.

I suggest: Could you please find out, if page.accept(absorber) or absorber.visit(page) is principally interruptable, if I run this in my own thread?

Regards, Gerd

tahir.manzoor · August 17, 2022, 4:08pm

@GRein

Unfortunately, it is difficult to say what the problem is without PDF and code example. We need simplified code example to reproduce this issue at our end. It would be great if you please share the requested resources here for testing. We will then investigate the issue and provide you more information on it.