Convert PDF to PNG in Multi-threading environment using Aspose.PDF for Java - OutOfMemory

Hello,

I’m facing an issue that the OutOfMemory exception thrown as converting PDF to image. What I can see via Task manager of Windows 10 is that the memory usage increasing greatly during the conversion time. Below is code snippet I used. I also attached my test code and the PDF.

testCode.zip (700.9 KB)

public static void main(String[] args) throws IOException {
		for (int i = 0; i < 20; i++) {
			final int index = i;
			Thread t = new Thread(() -> {
				try {
					convertOnePageToPNGImage(index);
				} catch (IOException e) {
					e.printStackTrace();
				}
			});

			t.start();
		}
	}

public static void convertOnePageToPNGImage(final int index) throws IOException {
	// Open document
	Document pdfDocument = new Document("D:\\workspace\\demo\\Aspose\\Aspose.PDF-for-Java-master\\Examples\\src\\main\\resources\\DocumentConversion\\sample_high_resolution_file.PDF");
	// Create stream object to save the output image
	java.io.OutputStream imageStream = new java.io.FileOutputStream("D:\\output\\Converted_Image" + index + ".png");
	// Create Resolution object
	Resolution resolution = new Resolution(300);
	// Create PngDevice object with particular resolution
	PngDevice pngDevice = new PngDevice(resolution);
	// Convert a particular page and save the image to stream
	pngDevice.process(pdfDocument.getPages().get_Item(1), imageStream);
	// Close the stream
	imageStream.close();
}

Note:

Regards

Hello,

Please see the test log enclosed.
test_log.zip (1.6 KB)

Regards

@dat.do

We did not notice this issue while testing the scenario with Aspose.PDF for Java 20.12 in our environment. Would you kindly try using the latest version and let us know in case you face any issue.

Converted_Image_1.png (772.7 KB)

@asad.ali
I got the same issue with Aspose PDF 20.12. Please see the test log for more details.

For your information, we use this feature to generate a thumbnail from selected page of the PDF. So, is there any other way to do that with Aspose PDF that using minimal of memory? Also note that, we don’t need a thumbnail with full/high resolution. In other word, we just want something with low resolution as in attachment.

testLogRound2_expectedThumbnail.zip (7.0 KB)

Regards

@dat.do

Could you please try increasing the Java Heap Size and also, you can use following code snippet as an alternative to generate image thumbnails for the PDF Pages;

Document pdfDocument = new Document(dataDir + "sample_high_resolution_file.pdf");
com.aspose.pdf.facades.PdfConverter converter = new com.aspose.pdf.facades.PdfConverter();
converter.bindPdf(pdfDocument);

converter.setStartPage(1);
converter.setEndPage(1);

while(converter.hasNextImage())
{
 converter.getNextImage(dataDir + "thumbnail.jpg", ImageType.getJpeg() , 100, 150, 100);
}

In case issue still occurs, please share a sample console application with us which is able to replicate the issue. We will again test the scenario in our environment and address it accordingly.

@asad.ali

I have given this code a try with the configuration for heap space of 4096MB (4GB) as what it’s on our production. The problem still happens. Please see the test log enclosed.

testLog_round3.zip (1.2 KB)

@dat.do

It seems like you are using the API in a multi-threaded manner. Would you kindly share a sample application or program (.java) file which shows the routine you are executing and is able to reproduce the exception that you are facing. We will test the scenario in our environment and share our feedback with you accordingly.

@asad.ali

I already provided it in the attachment on the first comment.

Regards

@dat.do

We have tested the scenario using the same code snippet that you initially provided and were able to notice the exception. However, we also observed that in your code, multiple threads are accessing the same PDF simultaneously which is not a recommended approach. Please note that Aspose.PDF is a multi-threaded safe API as long as one PDF is accessed by only one thread at one time.

Nevertheless, we have logged an investigation ticket as PDFJAVA-40040 in our issue tracking system to further analyze the scenario. We will further share our comments with you about the issue as soon as the ticket is resolved. Please be patient and spare us some time.

@asad.ali

Thanks for your feedback. I hope this would resolve soon because we really need this. For your information in our system it could be up to 100 threads at the same time to generate thumbnail from importing documents.

Regards

@dat.do

We will surely investigate and resolve the ticket on a first come first serve basis. As per our initial investigation, it seems like the issue is occurring due to multiple threads access the same file simultaneously. We will further inform you as soon as the investigation of the ticket is done. Please give us some time.

We are sorry for the inconvenience.

@asad.ali

I don’t think this mechanism of multiple threads reading the same pdf file could cause this problem. The problem I noticed here is that the memory consumption for one single thread is too much. If you really want to avoid the fact that multiple threads read the same file you can easily do that by cloning the input pdf file into multiple files with different names.

@dat.do

We will surely consider your concerns and will investigate the earlier logged ticket from every possible perspective. We will let you know as soon as we have some results against the investigation.

@asad.ali

Previously, as we used IcePdf for conversion we have configuration to cache streams into temporary files to avoid OutOfMemory. Is there any similar configuration in Aspose.PDF?

<!--
If the system property org.icepdf.core.streamcache.enabled=true, the file will be cached to a temp file; otherwise, the complete document stream will be stored in memory.
-->
<property name="org.icepdf.core.streamcache.enabled" value="true"/>

@dat.do

Aspose.PDF for Java does not offer any option like this. The only workaround/option was to increase the Java heap size which did not help in your case. So, we will further investigate the issue and let you know as soon as we have additional updates regarding ticket resolution.

@dat.do

We have investigated the earlier logged ticket. We recommend changing the resolution for the output image to 50, this helps to optimize memory usage.

    Resolution resolution = new Resolution(50);

Our investigation showed that for converting input .pdf file to output image (with resolution 50):

For 1 thread necessary 350-400Mb heap of memory (-Xmx400M);
For 2 threads necessary 700Mb heap of memory (-Xmx700M);
For 4 threads necessary 1500Mb heap of memory (-Xmx1500M);
For 10 threads necessary 3700Mb heap of memory (-Xmx3700M);
For 20 threads necessary 7500Mb heap of memory (-Xmx7500M);

The input file contains many objects and that is why for creating the output image the Aspose.PDF API uses 400MB of memory.

Please set at least 7500 MB of the heap for runs at 20 parallel threads.