Heap size issue handling using Aspose

izzy · October 9, 2020, 11:53am

We have a heap size issue when we merge pdfs into a single consolidated file which is a pdf. The heap size limit for the server we are using is 1.5 Gbs.

We are currently using input outpot stream to perform the pdf merger. Here is the snippet of the code in JAVA/Groovy:

PDFMergerUtility mergePdf = new PDFMergerUtility()
filesList?.each {
def byteArrayOutputStream
try {
def s3object = templateAmazonService.downloadFromS3(it?.filePath,it?.fileName, messageData?.bucketName)
if (s3object) {
InputStream s3Is = s3object?.objectContent;
byteArrayOutputStream = createByteArrayInputStream(s3Is);
s3object?.close();
InputStream is0 = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
if (is0)
mergePdf.addSource(is0)
}
byteArrayOutputStream?.close()
byteArrayOutputStream = null
}
catch (Exception ex) {
log.error(“Error occurred while adding file to mergePDF” + ex)
}
}
mergePdf.mergeDocuments();

The amount of pdfs we are merging which causes the heap size is more than 3000. We are fetching pdf records from the S3 and using its stream to merege the pdf. The conslidation in large number of files (more than 3000) is causing issue here.


We are also looking for another option to use Aspose's Document to check and see if it manages memory to reolve the heap size issue . Here is the snippet of the code:
    Document finalPDF = new Document()
            File[] files = file.listFiles()
            Document finalOutput = new Document()
            for (final File fileEntry : files) {
                Workbook workbook = new Workbook(fileEntry.getAbsolutePath())
                def fileName = fileEntry.getName()
                ByteArrayOutputStream dstStream = new ByteArrayOutputStream();
                workbook.save(dstStream, SaveFormat.PDF);
                ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
                Document tempDocument = new Document(srcStream)
                finalPDF.getPages().add(tempDocument.getPages())
            }
            finalOutput.save(bOutputStream, com.aspose.pdf.SaveFormat.Pdf);
        return bOutputStream.toByteArray()



The aspose version we are using is 19.5. We are using workbook to save the pdfs and finally merge it after the loop.

Is there another approach using the Aspose such that memory management is done in an effecient way and no heap size issues occurs?

asad.ali · October 11, 2020, 4:07pm

@izzy

We improve memory consumption and performance of the API in every new release. Hence, we always recommend using the latest version. Would you kindly try using Aspose.PDF for Java 20.9 at your side and in case you still face any issue, please share sample input files (necessary to replicate the issue) with us along with screenshots of high memory usage (if possible). We will test the scenario in our environment and address it accordingly.