We’ve been hit with a production outage for a week straight caused by what appears to
be an unbounded resource consumption in Page.getPageRect(true) when handed a PDF with
a malformed page tree. Every time the affected document was processed, the JVM heap
ballooned to 64 GB and the Tomcat instance died with GC overhead limit exceeded.
Posting here in the hope someone can confirm whether this is a known issue and whether
24.1+ or a current release handles it safely.
Environment
- Aspose.PDF for Java: 24.1
- JDK: Oracle JDK 1.8.0_341
- Server: Apache Tomcat 9.0.107 on Windows Server 2022
- Heap: -Xms16g -Xmx64g, G1GC
A heap dump taken just before OOM showed ~560 million live instances of
com.aspose.pdf.internal.l2h.l1v / l2n.l1v retaining ~60 GB. The GC-root path led
straight to the getPageRect/getRect_Rename_Namesake frame above. The mutual recursion
l2n.l1if.lt ↔ l2n.l1if.lf in the trace strongly suggests an unterminated parent-chain
walk while resolving inherited page attributes.
here it’s the Java code that triggers it
byte[] pdfBytes = Helpers.getFileBytes(pdfPath);
try (InputStream pdfInputStream = new ByteArrayInputStream(pdfBytes);
Document doc = new Document(pdfInputStream, password)) {
for (int i = 0; i < signatures.length(); i++) {
JSONObject signature = signatures.getJSONObject(i);
Page page = doc.getPages().get_Item(signature.getInt("pageNumber"));
Rectangle pageRectangle = page.getPageRect(true); // <-- OOM here
double pageWidth = pageRectangle.getWidth();
double pageHeight = pageRectangle.getHeight();
// ... compute signature placement ...
}
}
Works on millions of PDFs. Dies on the one below.
The malformed PDF
By dumping the byte buffers out of the heap we recovered the offending file. Its page
tree is the root cause:
2 0 obj
<<
/Count 2
/Kids [4 0 R 5 0 R]
/Type /Pages
endobj
4 0 obj
<<
/Type /Page
/MediaBox [0 0 612 792]
/Parent 4 0 R ← page 4 declares ITSELF as its parent
/Contents […]
…
endobj
5 0 obj
<<
/Type /Page
/MediaBox [0 0 612 792]
/Parent 4 0 R ← page 5’s parent points at a /Page, not the /Pages tree
/Contents […]
…
endobj
Both pages have a non-conformant /Parent:
- Page 4 is its own parent — a direct self-loop.
- Page 5 points to page 4 instead of to the /Pages dictionary (object 2).
Per the PDF spec, MediaBox / CropBox / Rotate are inherited from /Pages ancestors, so
getPageRect(true) has to walk the parent chain. On this document the walk is
non-terminating (or terminates only after generating enormous intermediate state,
given the millions of l2n.l1v instances retained).
The PDF was produced by PDFium (Chrome / Edge “Save as PDF”) and then annotated/signed
in Apple Preview / iOS Markup. It opens and renders fine in Adobe
Reader, Preview, and Chrome so it looks valid to end users, but it kills Aspose’s
geometry resolver.
Questions for Aspose support
- Is Page.getPageRect(true) expected to terminate cleanly when the page tree contains
a /Parent self-loop or non-/Pages parent? - Is this fixed in a newer release?
- Is there an officially recommended way to sanitize / repair a page tree before
passing the Document to Aspose APIs? Or a flag on LoadOptions that forces tolerant
parsing? - Short of an upgrade, is there a public API to detect a self-referential or
non-/Pages /Parent before calling getPageRect?
we were able to stop the OOM, but I’d strongly prefer for Aspose to fail fast
(e.g. throw InvalidPdfFileFormatException) on a cyclic page tree rather than try to
resolve it.
Thanks, this brought production down repeatedly for over 4 months now before we identified
the parent-cycle as the trigger, so any guidance on the correct long-term fix is
hugely appreciated.