Free Support Forum - aspose.com

PDF to Image Java using Aspose.PDF - OutOfMemoryException

I have numerous PDF files that cause OOM exceptions. I am trying to generate images from the PDF using the following code:

public static void main( String[] args ) {

    String fileName = "test.pdf";
    if ( args.length > 0 ) {
        fileName = args[0];
    }
    File pdfFile = new File( fileName );
    Document doc = null;
    InputStream in = null;
    try {
        in = new FileInputStream( pdfFile );
        doc = new Document( in );
        com.aspose.pdf.facades.PdfConverter converter = new com.aspose.pdf.facades.PdfConverter();
        converter.bindPdf( doc );

        converter.setStartPage( 1 );
        //converter.setEndPage( doc.getPages().size() );
        int pageNum = 1;
        while ( converter.hasNextImage() ) {

            System.out.println( "Generating slides for page " + pageNum );
            Page page = doc.getPages().get_Item( pageNum );
            System.out.println( "Got Page element for page " + pageNum );
            String fullFileName = "full_" + pageNum + ".png";
            OutputStream fullStream = new FileOutputStream( fullFileName );
            System.out.println( "Calling getNextImage" );
            converter.getNextImage( fullStream, ImageType.getPng() ); //jpg", ImageType.getJpeg() , 100, 150, 100);
            System.out.println( "Returned from calling getNextImage" );
            pageNum++;
        }
    } catch ( FileNotFoundException e ) {
        e.printStackTrace();
    }
}

Java version is 1.8. Environment is Linux/64 and Windows Server 2016. I have tried changing the Xmx setting to 6GB which helped in some cases, but not enough.
Strangely, I have a mac where this runs without any additional settings and is able to convert the pdfs. Also 1.8.

@UpperVolta

Would you please make sure that you are using the latest version of the API. In case issue is still persisting, please let us know if it is occurring with certain PDF files of large sizes OR with any PDF file? Please share some sample PDF document(s) with us so that we can test the scenario in our environment and address it accordingly.

I am using version 21.1. If you have something more recent, I can try it.

I am having problems like this with many PDF files. All of them seem to take an extraordinary amount of memory and CPU to process. I am running a 6GB heap and this has allowed me to process most of the files under 15MB in size, but I’m still running out with some that are around 20 MB.

I can send some samples, but they belong to our customer, can you keep them private?

I am also open to using the API differently if you think that would help. What we are trying to do is generate images of each page in the PDF. Sometimes this is just one page. We’re doing so using a dedicated thread pool so we only have one of these per JVM running at a time, although concurrency isn’t related to our problems here.

@UpperVolta

You can please try the DOM approach to convert PDF to Image using Aspose.PDF for Java. However, please share sample PDF file with us in case you face some issue with mentioned approach. We assure you that we only use the files for investigation purpose and once investigation is complete, we erase them from our system. We also do not disclose your files with any one.

I can try that. When declaring the rectangle I would like the entire page. How do I know the dimensions of the page?

How should I share the file with you? It is too large for email.

@UpperVolta

You do not need to specify any rectangle while converting PDF Page to Image using DOM approach. However, in case you need to get the page dimension, you can use Page.getRect() method.

I did try that, but it had no effect on memory consumption.

@UpperVolta

Could you please share your sample input file with us with sample code snippet that you are using. We will test the scenario in our environment and address it accordingly.

How? The file size is more than the limit you allow to upload.

Here is the code:

import com.aspose.pdf.*;import com.aspose.pdf.devices.BmpDevice;
import com.aspose.pdf.devices.Resolution;

import javax.print.Doc;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;

public class PDFConvert {

static License license = null;

static {
System.setProperty( “java.awt.headless”, “true” );

// Here’s how to read the license file in, according to Aspose.
if ( license == null ) {
license = new License();

InputStream fstream = null;
try {
fstream = getClassPathResourceAsStream( “Aspose.Total.Java.lic” );
if ( fstream == null ) {
log( "Unable to read license file: " + “Aspose.Total.Java.lic” );
}

license.setLicense( fstream );

} catch ( Exception ex ) {
System.out.println( ex );
} finally {
try {
if ( fstream != null )
fstream.close();
} catch ( IOException ioe ) {
System.out.println( ioe );
}
}
}
}

public static InputStream getClassPathResourceAsStream( String fileName ) {
InputStream in = PDFConvert.class.getClassLoader().getResourceAsStream( fileName );
if ( in == null ) {
//Try to load it with prepending slash
log( “Could not find” + fileName + " trying to find it by prepending slash." );
in = PDFConvert.class.getClassLoader().getResourceAsStream( “/” + fileName );
}
return in;
}

public static void generateSlidesConverter( String fileName ) {
File pdfFile = new File( fileName );
Document doc = null;
InputStream in = null;
log( “Starting generate slides via converter” );
try {
in = new FileInputStream( pdfFile );
doc = new Document( in );
com.aspose.pdf.facades.PdfConverter converter = new com.aspose.pdf.facades.PdfConverter();
converter.bindPdf( doc );

converter.setStartPage( 1 );
//converter.setEndPage( doc.getPages().size() );
int pageNum = 1;
while ( converter.hasNextImage() ) {

log( "Generating slides for page " + pageNum );
Page page = doc.getPages().get_Item( pageNum );
log( “Got Page element for page " + pageNum );
String fullFileName = “full_” + pageNum + “.png”;
OutputStream fullStream = new FileOutputStream( fullFileName );
log( “Calling getNextImage” );
converter.getNextImage( fullStream, ImageType.getPng() ); //jpg”, ImageType.getJpeg() , 100, 150, 100);
log( “Returned from calling getNextImage” );
pageNum++;
}
} catch ( FileNotFoundException e ) {
e.printStackTrace();
}
log(“Done generating slides vis converter”);
}

public static void main( String[] args ) {

String fileName = “test.pdf”;
if ( args.length > 0 ) {
fileName = args[0];
}

generateSlidesConverter( fileName );
generateSlidesDOM( fileName );
}

private static void generateSlidesDOM( String pdfFile ) {
log(“Starting generate slides via DOM”);
Document document = new Document( pdfFile );
Rectangle pageRect = document.getPages().get_Item( 1 ).getRect();
log(“Have Page Rect”);
// Get rectangle of particular page region
//Rectangle pageRect = new Rectangle( 20, 671, 693, 1125 );
// set CropBox value as per rectangle of desired page region
document.getPages().get_Item( 1 ).setCropBox( pageRect );
// save cropped document into stream
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
document.save( outStream );
log(“Saved doc, creating new one”);
// open cropped PDF document from stream and convert to image
document = new Document( new ByteArrayInputStream( outStream.toByteArray() ) );
// Create Resolution object - I have no idea what this does
Resolution resolution = new Resolution( 100 );
// Create BMP device with specified attributes
BmpDevice bmpDevice = new BmpDevice( resolution );
// Convert a particular page and save the image to stream
log(“Processesing device”);
bmpDevice.process( document.getPages().get_Item( 1 ), “Output.bmp” );
log(“Saved image - done with generate slides”);
}
public static void log( String msg ) {
SimpleDateFormat sdf = new SimpleDateFormat(“hh.mm.ss”);
String ts = sdf.format( new Date());
System.out.println(ts+"\t"+msg);
}
}

@Kurt_Mehlhoff

You can please upload the sample file to Google Drive or Dropbox and share the link with us.