Out of memory on TextFragmentAbsorber.visit

xmedia · June 21, 2021, 11:44am

I want to iterate over the text fragments that I got by calling “TextFragmentAbsorber.visit”. It works in many cases, but some files caused an OOM error:

Caused by: java.lang.OutOfMemoryError: Java heap space
 at com.aspose.pdf.internal.l2u.l0h.lI(Unknown Source)
 at com.aspose.pdf.internal.l2u.lu.lI(Unknown Source)
 at com.aspose.pdf.internal.l2u.lu.lI(Unknown Source)
 at com.aspose.pdf.internal.l2u.lu.lI(Unknown Source)
 at com.aspose.pdf.OperatorCollection.lb(Unknown Source)
 at com.aspose.pdf.OperatorCollection.ld(Unknown Source)
 at com.aspose.pdf.OperatorCollection.size(Unknown Source)
 at com.aspose.pdf.internal.l5if.ly.ly(Unknown Source)
 at com.aspose.pdf.internal.l5if.ly.ly(Unknown Source)
 at com.aspose.pdf.internal.l5if.l0t.lI(Unknown Source)
 at com.aspose.pdf.internal.l5if.l0t.lI(Unknown Source)
 at com.aspose.pdf.internal.l5if.l0t.le(Unknown Source)
 at com.aspose.pdf.internal.l5if.l0t.<init>(Unknown Source)
 at com.aspose.pdf.internal.l5if.l0t.<init>(Unknown Source)
 at com.aspose.pdf.TextFragmentAbsorber.visit(Unknown Source)

The source:

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("[\\S ]+",
      new TextSearchOptions(true));
 textFragmentAbsorber.visit(pdf);

Java 14 (openjdk), aspose-pdf 21.5.

test1.pdf (1.6 MB)

asad.ali · June 21, 2021, 9:01pm

@xmedia

Could you please try to use the 21.6 version of the API and if issue still persists, please share some more details like OS Name and Version. We will further proceed to assist you accordingly.

xmedia · June 22, 2021, 6:24am

with version 21.6 I got the same error.
I’ve checked it on macOS 10.15.7 and Debian 10.5.

asad.ali · June 22, 2021, 1:38pm

@xmedia

We have tested the scenario using Java 14 in Windows and did not replicate the error. However, we are preparing the environment to test the case under macOS and will get back to you in a while.

mudassir.fayyaz · June 28, 2021, 8:58pm

@xmedia

We can not reproduce the issue on MacOS Big Sur with OpenJDK 14.0.2. You may try to increase the heap size to avoid out of memory exceptions because this does not seem to be an issue with the API.

xmedia · June 29, 2021, 2:14pm

Well i tried it with 1g max heap space and it works. But when I try the original 352 page document I get an OOM error with all my memory (32GB).

To understand, why it need so much memory? The document is 22.4 MB.

P.S. i have macOS Catalina 10.15.7 and openJDK 14.0.2.

mudassir.fayyaz · June 29, 2021, 10:54pm

@xmedia

I have been able to reproduce the issue on our end. A ticket with ID PDFJAVA-40645 has been created in our issue tracking system to further investigate the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

emeghana · April 27, 2023, 12:10pm

Hi, Is there any work around on this ?

asad.ali · April 27, 2023, 6:04pm

@emeghana

We are afraid that we cannot share any workaround yet because the ticket has not been yet fully investigated. Nevertheless, we will surely inform you as soon as we have some updates in this regard. Please spare us some time.

We are sorry for the inconvenience.