PDF to Image Java using Aspose.PDF - OutOfMemoryException

Kurt_Mehlhoff · February 22, 2021, 4:00pm

I can try that. When declaring the rectangle I would like the entire page. How do I know the dimensions of the page?

How should I share the file with you? It is too large for email.

asad.ali · February 22, 2021, 10:29pm

You do not need to specify any rectangle while converting PDF Page to Image using DOM approach. However, in case you need to get the page dimension, you can use Page.getRect() method.

UpperVolta · February 24, 2021, 7:02pm

I did try that, but it had no effect on memory consumption.

asad.ali · February 24, 2021, 9:50pm

@UpperVolta

Could you please share your sample input file with us with sample code snippet that you are using. We will test the scenario in our environment and address it accordingly.

Kurt_Mehlhoff · February 24, 2021, 10:19pm

How? The file size is more than the limit you allow to upload.

Here is the code:

import com.aspose.pdf.*;import com.aspose.pdf.devices.BmpDevice;
import com.aspose.pdf.devices.Resolution;

import javax.print.Doc;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;

public class PDFConvert {

static License license = null;

static {
System.setProperty( “java.awt.headless”, “true” );

// Here’s how to read the license file in, according to Aspose.
if ( license == null ) {
license = new License();

InputStream fstream = null;
try {
fstream = getClassPathResourceAsStream( “Aspose.Total.Java.lic” );
if ( fstream == null ) {
log( "Unable to read license file: " + “Aspose.Total.Java.lic” );
}

license.setLicense( fstream );

} catch ( Exception ex ) {
System.out.println( ex );
} finally {
try {
if ( fstream != null )
fstream.close();
} catch ( IOException ioe ) {
System.out.println( ioe );
}
}
}
}

public static InputStream getClassPathResourceAsStream( String fileName ) {
InputStream in = PDFConvert.class.getClassLoader().getResourceAsStream( fileName );
if ( in == null ) {
//Try to load it with prepending slash
log( “Could not find” + fileName + " trying to find it by prepending slash." );
in = PDFConvert.class.getClassLoader().getResourceAsStream( “/” + fileName );
}
return in;
}

public static void generateSlidesConverter( String fileName ) {
File pdfFile = new File( fileName );
Document doc = null;
InputStream in = null;
log( “Starting generate slides via converter” );
try {
in = new FileInputStream( pdfFile );
doc = new Document( in );
com.aspose.pdf.facades.PdfConverter converter = new com.aspose.pdf.facades.PdfConverter();
converter.bindPdf( doc );

converter.setStartPage( 1 );
//converter.setEndPage( doc.getPages().size() );
int pageNum = 1;
while ( converter.hasNextImage() ) {

log( "Generating slides for page " + pageNum );
Page page = doc.getPages().get_Item( pageNum );
log( “Got Page element for page " + pageNum );
String fullFileName = “full_” + pageNum + “.png”;
OutputStream fullStream = new FileOutputStream( fullFileName );
log( “Calling getNextImage” );
converter.getNextImage( fullStream, ImageType.getPng() ); //jpg”, ImageType.getJpeg() , 100, 150, 100);
log( “Returned from calling getNextImage” );
pageNum++;
}
} catch ( FileNotFoundException e ) {
e.printStackTrace();
}
log(“Done generating slides vis converter”);
}

public static void main( String[] args ) {

String fileName = “test.pdf”;
if ( args.length > 0 ) {
fileName = args[0];
}

generateSlidesConverter( fileName );
generateSlidesDOM( fileName );
}

private static void generateSlidesDOM( String pdfFile ) {
log(“Starting generate slides via DOM”);
Document document = new Document( pdfFile );
Rectangle pageRect = document.getPages().get_Item( 1 ).getRect();
log(“Have Page Rect”);
// Get rectangle of particular page region
//Rectangle pageRect = new Rectangle( 20, 671, 693, 1125 );
// set CropBox value as per rectangle of desired page region
document.getPages().get_Item( 1 ).setCropBox( pageRect );
// save cropped document into stream
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
document.save( outStream );
log(“Saved doc, creating new one”);
// open cropped PDF document from stream and convert to image
document = new Document( new ByteArrayInputStream( outStream.toByteArray() ) );
// Create Resolution object - I have no idea what this does
Resolution resolution = new Resolution( 100 );
// Create BMP device with specified attributes
BmpDevice bmpDevice = new BmpDevice( resolution );
// Convert a particular page and save the image to stream
log(“Processesing device”);
bmpDevice.process( document.getPages().get_Item( 1 ), “Output.bmp” );
log(“Saved image - done with generate slides”);
}
public static void log( String msg ) {
SimpleDateFormat sdf = new SimpleDateFormat(“hh.mm.ss”);
String ts = sdf.format( new Date());
System.out.println(ts+"\t"+msg);
}
}

asad.ali · February 25, 2021, 5:08am

@Kurt_Mehlhoff

You can please upload the sample file to Google Drive or Dropbox and share the link with us.

Kurt_Mehlhoff · March 1, 2021, 9:01pm

Here is one of the PDF files which cause problems.
https://www.dropbox.com/s/k2c55onccmj0ctv/727.pdf?dl=0

asad.ali · March 2, 2021, 5:39pm

@UpperVolta

We were able to reproduce the issue in our environment while using Aspose.PDF for Java 21.2 and the following code snippet:

Document pdfDocument = new Document(dataDir + "727.pdf");
for(Page page:pdfDocument.getPages()) {
 java.io.OutputStream imageStream = new java.io.FileOutputStream(dataDir + "Converted_Image_"+page.getNumber()+".png");
 com.aspose.pdf.devices.Resolution resolution = new com.aspose.pdf.devices.Resolution(100);
 com.aspose.pdf.devices.BmpDevice pngDevice = new com.aspose.pdf.devices.BmpDevice(resolution);
 pngDevice.process(page, imageStream);
 imageStream.close();
}

Therefore, we have logged it as PDFJAVA-40232 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

asad.ali · July 9, 2021, 6:50pm

@UpperVolta

We tried to reproduce the current issue and got equals results for all environments mentioned by the you. (MacOS 11.4, Windows 10 Pro, Linux (in the docker)). All cases of this code snippet were run with option: -Xmx4G and all cases were completed successfully. Please provide additional details so that we can reproduce this problem.

The issue has been verified, and OOM does not reproduce on either the 6GB memory stack or the 4GB. But the processing time with 6GB took 5 minutes for conversion, but with 4Gb it took 19 minutes. The document is very complicated and has a lot of objects to process. And this is expected behavior, that Garbage Collector spends a lot of time for releasing unused instances in an environment with a lack of memory.

Kurt_Mehlhoff · January 24, 2022, 11:42pm

Yes. The PDF belongs to a customer and is not for public use. Should I email it to you?

I can either post the source or email that as well.

asad.ali · January 25, 2022, 3:54pm

@UpperVolta

We believe that your message was about the other query which you posted recently i.e. PDF to Image Conversion results in endless memory consumption. We are sending you a private message and you can share your file in reply to that message so that we will proceed to assist you accordingly.

Kurt_Mehlhoff · January 25, 2022, 4:07pm

Already done

asad.ali · January 25, 2022, 7:04pm

@UpperVolta

We were able to reproduce the issue in our environment while testing the scenario with Aspose.PDF for Java 22.1. Therefore, this has been logged as PDFJAVA-41255 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

UpperVolta · April 18, 2022, 3:18pm

Any progress on this issue?

asad.ali · April 18, 2022, 8:57pm

@UpperVolta

Regretfully, the earlier logged ticket could not get resolved due to other issues in the queue. We will surely fix the issue on first come first serve basis and notify you via this forum thread as soon as we have more updates about ticket resolution. Please spare us some time.

We apologize for your inconvenience.

UpperVolta · January 30, 2023, 10:48pm

Any progress on this issue? Is your inaction an indication that you have no plans to address the issue?

asad.ali · January 31, 2023, 12:37am

@UpperVolta

We apologize for the delay in resolution of the earlier logged ticket. Please note that we do resolve every logged issue. However, resolution time of the issue depends upon many factors to be noticed like the issue complexity and nature and number of API components involve in it. Nevertheless, your concerns have been recorded and we will consider them during ticket analysis process. We will inform you as soon as we make some progress towards ticket fix. Your patience is highly appreciated in this regard.

We apologize for the inconvenience.

aspose.notifier · June 5, 2023, 9:59pm

The issues you have found earlier (filed as PDFJAVA-41255) have been fixed in Aspose.PDF for Java 23.5.

UpperVolta · June 7, 2023, 4:57pm

I downloaded aspose-pdf-23.5 and tried the same source and sample PDF that I provided to you when I opened the issue. The issue is not resolved. The program uses 5 GB of memory and 100% cpu and never completes. At least it has not completed so far and it has been running two hours.

Why was this issue closed?

asad.ali · June 7, 2023, 11:49pm

@UpperVolta

We have optimized the process and now 2.5 GB (-Xmx2560M) of RAM is enough for the successful execution of the code.

In addition, you need to enable swap: MemoryExtender.setSwapEnabled(true);

MemoryExtender.setSwapEnabled(true);
Document pdfDocument = new Document(dataDir + "QwuA2vRdX9.pdf");
for (Page page : pdfDocument.getPages()) {
 int number = page.getNumber();
 java.io.OutputStream imageStream = new java.io.FileOutputStream(dataDir + "Converted_Image_"+page.getNumber()+".png");
 com.aspose.pdf.devices.Resolution resolution = new com.aspose.pdf.devices.Resolution(100);
 com.aspose.pdf.devices.PngDevice pngDevice = new com.aspose.pdf.devices.PngDevice(resolution);
 pngDevice.process(page, imageStream);
 imageStream.close();
}