How to read Read Text Stamps from Aspose.pdf without OOM

vgcnerella2019 · September 28, 2020, 10:17pm

We are using pdfContentEditor.getStamps(pageNum) to read the stamps from that page.
We are getting OOM for some pdf files. Heap dump shows it occupied 3.8 GB heap memory.

I can not share the pdf file which caused this OOM.

any suggestions to over come OOM for this ?

Stack trace

at java.lang.OutOfMemoryError.()V (OutOfMemoryError.java:48)
at com.aspose.pdf.internal.ms.System.IO.l1j.lI(I)V (Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l1j.lj(I)V (Unknown Source)
at com.aspose.pdf.internal.ms.System.IO.l1j.write([BII)V (Unknown Source)
at com.aspose.pdf.internal.l2h.lu.lI(Lcom/aspose/pdf/internal/l5y/l0t;Lcom/aspose/pdf/internal/l5y/l0h;ZZ)Lcom/aspose/pdf/internal/ms/System/Collections/Generic/l0t; (Unknown Source)
at com.aspose.pdf.internal.l2h.lu.lI(Lcom/aspose/pdf/internal/l5y/l0t;Lcom/aspose/pdf/internal/l5y/l0h;)Lcom/aspose/pdf/internal/ms/System/Collections/Generic/l0t; (Unknown Source)
at com.aspose.pdf.OperatorCollection.lb()V (Unknown Source)
at com.aspose.pdf.OperatorCollection.ld()Lcom/aspose/pdf/internal/ms/System/Collections/Generic/l0t; (Unknown Source)
at com.aspose.pdf.OperatorCollection.size()I (Unknown Source)
at com.aspose.pdf.OperatorCollection$lI.hasNext()Z (Unknown Source)
at com.aspose.pdf.facades.PdfContentEditor.lI(Lcom/aspose/pdf/Matrix;Lcom/aspose/pdf/OperatorCollection;Lcom/aspose/pdf/Resources;Lcom/aspose/pdf/internal/ms/System/Collections/Generic/l0t;Ljava/lang/Object;)V (Unknown Source)
at com.aspose.pdf.facades.PdfContentEditor.lI(Lcom/aspose/pdf/OperatorCollection;Lcom/aspose/pdf/Resources;Lcom/aspose/pdf/internal/ms/System/Collections/Generic/l0t;Ljava/lang/Object;)V (Unknown Source)
at com.aspose.pdf.facades.PdfContentEditor.lI(I)[Lcom/aspose/pdf/facades/PdfContentEditor$lI; (Unknown Source)
at com.aspose.pdf.facades.PdfContentEditor.getStamps(I)[Lcom/aspose/pdf/facades/StampInfo; (Unknown Source)

asad.ali · September 29, 2020, 5:04pm

@vgcnerella2019

Would you please make sure to use latest version of the API i.e. Aspose.PDF for Java 20.9. In case you still face any issue, we need to address it and for the purpose, we need a sample file to replicate the issue at our side. In case you cannot share it publicly, you can do it using private message option. Please click over username and press Blue Message Button to send a private message.

PS: You can also try increasing the java heap size.

vgcnerella2019 · September 30, 2020, 12:47am

Thanks for the reply.

We have already tried to increase the heap size. We made to 7GB from 4GB, it exploded that in less than 2 minutes for extracting text stamps from one file which is around 18MB and at the end OOM is thrown.

Is there any alternative to get the TextStamp from resources or contents by not calling the pdfContentEditor.getStamps(pageNum) ?
Is it possible to specify a flag such that look for only Form type of stamp objects not of image types ?

We wanted to delete stamp by its id. That is also giving OOM because internal implementation is same as getStamps.

we need the stamp id also along with the text to do some checks.

asad.ali · September 30, 2020, 7:19pm

@vgcnerella2019

You can extract text from Stamps using the StampAnnotation Class. However, we further need to investigate the scenario and address it. Regretfully, there is no alternate approach to delete stamps and get stamp IDs at the moment. Would you kindly provide your sample code snippet and sample PDF file so that we can further proceed to assist you accordingly.