GC overhead limit exceeded - Aspose PDF 21.2

dft5209 · March 29, 2021, 2:26pm

Hi,
I have memory issue (GC overhead limit exceeded) wih Aspose PDF when:
1- Using big source PDF file (1058385471 bytes), 884104 pages
2- Trying to create 884042 new PDF files by extracting pages from the big PDF. Start page and number of pages are read from an XML file

Scenario 1: Open source PDF file for each new PDF to create. This has impact on performance
Scenario 2: Open source PDF file at the beginning (start of the java module). Better performance

GC limit exceeded happened with both scenarios and with version 19.2 and last version 21.2

Could you please help?

Java code:
public boolean splitPdf(String sourcePDF, int startPage, int pageQuantity, String newPdfFilename) {
int endPage = startPage + pageQuantity - 1;

	Document documentSrcPdf = null;
	Document documentNewPdf = null;
	try {
		documentSrcPdf = new Document(sourcePDF);
		documentNewPdf = new Document();
		
		List<Page> pages = new ArrayList<Page>();
		for (int page = startPage; page <= endPage; page++) {
			pages.add(documentSrcPdf.getPages().get_Item(page));
		}			
		documentNewPdf.getPages().add(pages);
		documentNewPdf.save(newPdfFilename);
		
		return true;
	} catch (Exception e) {
		e.printStackTrace();
		return false;
	} finally {
		if (documentSrcPdf != null) {
			documentSrcPdf.dispose();
			documentSrcPdf.close();
			documentSrcPdf = null;
		}
		if (documentNewPdf != null) {
			documentNewPdf.dispose();
			documentNewPdf.close();
			documentNewPdf = null;
		}
	}
}

asad.ali · March 29, 2021, 10:53pm

@dft5209

Could you please try increasing the Java Heap Size and if issue still persists, please share your environment variables with us along with sample source PDF. We will test the scenario in our environment and address it accordingly. You may please upload your sample file to Google Drive or Dropbox and share the link with us.

dft5209 · March 30, 2021, 3:29am

I am already using 8Gig head size. I will increase a bit an try.
Some traces/errors:
Exception in thread “main” java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.String.toCharArray(String.java:2899)
at com.aspose.pdf.internal.l57y.l1j.lI(Unknown Source)
at com.aspose.pdf.internal.l57y.l1j.lI(Unknown Source)
at com.aspose.pdf.internal.ms.System.l10l.lI(Unknown Source)
at com.aspose.pdf.internal.ms.System.l10l.lI(Unknown Source)
at com.aspose.pdf.internal.ms.System.l10l.lI(Unknown Source)
at com.aspose.pdf.internal.l0y.lf.lf(Unknown Source)
at com.aspose.pdf.internal.l5h.l2v.lI(Unknown Source)
at com.aspose.pdf.internal.l5h.l2v$lf.lI(Unknown Source)
at com.aspose.pdf.internal.l5h.l2v$lf.serialize(Unknown Source)
at com.aspose.pdf.internal.l9p.le.lI(Unknown Source)
at com.aspose.pdf.internal.l9p.le.serialize(Unknown Source)
at com.aspose.pdf.internal.l9t.lb.lI(Unknown Source)
at com.aspose.pdf.internal.l8n.l0p.lI(Unknown Source)
at com.aspose.pdf.internal.l8n.l0t.lI(Unknown Source)
at com.aspose.pdf.internal.l0h.l0p.lI(Unknown Source)
at com.aspose.pdf.ADocument.lf(Unknown Source)
at com.aspose.pdf.ADocument.lf(Unknown Source)
at com.aspose.pdf.ADocument.save(Unknown Source)
at com.aspose.pdf.Document.save(Unknown Source)
at com.desjardins.aspose.SplitPDF.splitPdf(SplitPDF.java:151)
at com.scd.proarchiver.parse.XMLParserReleve.parseXMLAndStore(XMLParserReleve.java:271)
at com.scd.proarchiver.main.Import.main(Import.java:185)

dft5209 · March 30, 2021, 3:29am

Please note that I cannot provide the source file. It contains a very sensitive information.
You can try with any huge PDF file.

asad.ali · March 30, 2021, 7:15pm

@dft5209

We tested the scenario in our environment using Aspose.PDF for Java 21.3 with a sample PDF of 4GB size. We noticed StackOverFlowException in our environment when we used your code snippet. Therefore, we have logged an issue as PDFJAVA-40318 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please have patience and give us some time.

We apologize for the inconvenience.

dft5209 · March 30, 2021, 7:23pm

Thanks
Appreciate your support

asad.ali · August 30, 2023, 7:31pm

@dft5209

OptimizedMemoryStream extends Stream was added. It supports big amount of data (more than 5 or 10 GB).

Also was added Document constructor and Document.save method with Stream parameter.

And now you can either load pdf file in this stream and than use it in initialization Document instance or create and save a big pdf document.

Example how to load document from file:

OptimizedMemoryStream stream = new OptimizedMemoryStream();
        FileInputStream fis = new FileInputStream(dataDir + "Input.pdf");

        try {
            byte[] buffer = new byte[65536];
            int available = fis.read(buffer, 0, buffer.length);
            while (available>0) {
                stream.write(buffer, 0, available);
                available = fis.read(buffer, 0, buffer.length);
            }
            fis.close();
        } catch (IOException e) {
            throw new IOException("exception", e);
        }

        stream.setPosition(0);
        com.aspose.pdf.Document documentSrcPdf = new Document(stream);

        try{
            for (int page = 1; page <= documentSrcPdf.getPages().size(); page++) {

                Document documentNewPdf = new Document();
                documentNewPdf.getPages().add(documentSrcPdf.getPages().get_Item(page));
                documentNewPdf.save(dataDir + "out/output_page_"+page+.pdf");
                break;
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (documentSrcPdf != null) {
                documentSrcPdf.close();
            }
        }

        stream.close();

But notice if the output document will be probably took to many pages - it also should be created on base of OptimizedMemoryStream.

Example how to copy all of pages into a new big document:

OptimizedMemoryStream stream = new OptimizedMemoryStream();
        FileInputStream fis = new FileInputStream(dataDir + "Input.pdf");

        try {
            byte[] buffer = new byte[65536];
            int available = fis.read(buffer, 0, buffer.length);
            while (available>0) {
                stream.write(buffer, 0, available);
                available = fis.read(buffer, 0, buffer.length);
            }
            fis.close();
        } catch (IOException e) {
            throw new IOException("exception", e);
        }

        stream.setPosition(0);
        com.aspose.pdf.Document documentSrcPdf = new Document(stream);
        Document documentNewPdf = new Document(new OptimizedMemoryStream());
        try{
            java.util.List<Page> pages = new java.util.ArrayList<Page>();
            for (int page = 1; page <= documentSrcPdf.getPages().size(); page++) {
                pages.add(documentSrcPdf.getPages().get_Item(page));
            }
            documentNewPdf.getPages().add(pages);
            documentNewPdf.save(dataDir + "output.pdf");

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (documentSrcPdf != null) {
                documentSrcPdf.close();
                documentSrcPdf = null;
            }
            if (documentNewPdf != null) {
                documentNewPdf.close();
                documentNewPdf = null;
            }
        }

        stream.close();