Exception when converting html to pdf in concurrent threads using Java

arjana · October 13, 2020, 11:11am

Hello,

We are trying to convert an HTML file to the PDF format using Java. Conversions are run concurrently, by separate threads. If two (or more) conversions are run at the same time, most of the time the following exception occurs:

Exception in thread "Thread-3" Exception in thread "Thread-4" java.lang.NullPointerException
at com.aspose.pdf.internal.html.Configuration.initializeServices(Unknown Source)
at com.aspose.pdf.internal.html.Configuration.<init>(Unknown Source)
at com.aspose.pdf.l6h.lI(Unknown Source)
at com.aspose.pdf.l6h.lI(Unknown Source)
at com.aspose.pdf.l6h.lI(Unknown Source)
at com.aspose.pdf.l6h.lI(Unknown Source)
at com.aspose.pdf.ADocument.lI(Unknown Source)
at com.aspose.pdf.ADocument.<init>(Unknown Source)
at com.aspose.pdf.Document.<init>(Unknown Source)

Here is a sample piece of code that always results in an error:

package lt.test;

import com.aspose.pdf.Document;
import com.aspose.pdf.HtmlLoadOptions;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;

public class ConcurrencyTest
{
public static void main(String[] args) throws InterruptedException
{
	for(int i = 0; i < 5; i++)
	{
		new Thread(new ConverterThread()).start();
 //			Thread.sleep(1000 * 5);
    		}
        }
}

class ConverterThread implements Runnable
{
private static final String RESOURCE_DIR = "E:\\test\\asposesvg\\src\\main\\resources\\";

@Override
public void run()
{
	System.out.println("thread " + Thread.currentThread().getName() + " started");

	HtmlLoadOptions options = new HtmlLoadOptions();
	options.setInputEncoding(StandardCharsets.UTF_8.name());

	Document document = new Document(RESOURCE_DIR + "concurrency_template.html", options);
	try(FileOutputStream fos = new FileOutputStream(System.currentTimeMillis() + "_concurrency_result.pdf"))
	{
		document.save(fos);
	}
	catch(FileNotFoundException e)
	{
		e.printStackTrace();
	}
	catch(IOException e)
	{
		e.printStackTrace();
	}
}

}

However, if I introduce a pause between each thread (e.g. by uncommenting Thread.sleep(1000 * 5);), then conversions are successful.

Is it a bug, or the code should be somehow different for concurrent execution to run successfully?

aspose-pdf version: 20.9
os: reproduced on MS Windows 10 Enterprise and CentOS 7
java: 14

asad.ali · October 13, 2020, 9:53pm

@arjana

The Aspose.PDF API is multi-threaded safe which means you can use it in a multi-threaded environment as long as one document is accessed by one thread only at a time. In other words, a document should not be accessed by other thread if it is already in process by one. In order to avoid simultaneous access to same document, you can introduce Thread.sleep(1000 * 5) to make sure that conversion is completed by previous thread.

arjana · October 14, 2020, 6:25am

Well, in my example it is possible to add a delay before running a new thread, as they are started in a loop. But in reality, a new thread is started when an end user clicks on a button to convert the data, and a delay would make no sense. If there are many users, it is possible that they will click on the button nearly simultaneously (actually, the problem was noticed exactly when that happened).

You are telling that each document should be accessed by one thread. In my example, each thread creates its own Document object, and other threads have no access to it.

Is my understanding correct: each thread creates a new Document object but it is not enough to run a successful conversion, as there is a ‘initialize services’ step when some common object(s) may be accessed/modified by different threads?
Is it possible to change the code in such a way that parallel conversions wouldn’t result in errors from time to time?

asad.ali · October 14, 2020, 6:39pm

@arjana

Yes, you are right. In each thread, a new Document object is being created but every new object is accessing the same file i.e. “concurrency_template.html” as per our understandings.

In case you are converting the same file into a PDF document in multiple threads, you can please try to generate a temporary copy of the HTML file before the thread start and pass the copy of the file to the thread for PDF generation. Once a PDF is generated, you can delete the temporary generated copy via code. We hope this helps.

arjana · October 16, 2020, 10:36am

Thank you for your answers. In the real application, the input html is not read from the same file, but a new InputStream is constructed from a dynamically generated html string, and passed to the Document.

Just to check, I tried to create a unique copy of an html file for each thread in my example, but the same exception was still being thrown.

Finally, we decided to change the code in such a way that no parallel new Document(…) invocations are made, other threads need to wait if some other thread executes this code.

asad.ali · October 16, 2020, 6:31pm

@arjana

Thanks for your feedback.

We hope that implemented solution has sorted out the issue you have been facing. However, in case another issue occur, please feel free to let us know.