Docx to PNG conversion performance issues

Dear Aspose Team,

on of our customers is currently experiencing major performance issues.
They use quite huge documents (80 pages and more).
For a preview functionality, we are converting all pages to png.

This is working fine with .Net, but with the same Implementation in Java, we are now experiencing bad performance and long reponse times.

Aspose version tested: Aspose.Words for Java 15.7.0

Where in .Net the conversions take a maximum of half a minute, but in average 10 seconds,
the java conversion takes in times over 7 minutes for the same operation.

The coding we use:

Document wordDocument = new Document(stream);

for (int pageIndex = 0; pageIndex < wordDocument.getPageCount(); pageIndex++)
{

    ByteArrayOutputStream imageOutputStream = new ByteArrayOutputStream();
    ImageSaveOptions imageSaveOptions = new ImageSaveOptions(SaveFormat.PNG);
    imageSaveOptions.setPageIndex(pageIndex);
    imageSaveOptions.setPageCount(1);
    imageSaveOptions.setResolution(resolution);
    imageSaveOptions.setDmlRenderingMode(DmlRenderingMode.DRAWING_ML);

    InputStream inputStream = new ByteArrayInputStream(content);
    Document blankDoc = new Document(inputStream);
    blankDoc.save(imageOutputStream, imageSaveOptions);

    byte[] docByte = imageOutputStream.toByteArray();
    resultList.add(docByte);

    imageOutputStream.close();
    inputStream.close();
}

Please find attached an example document, which we used to do the measurements.

Any hint towards improving the performance here is much appreciated.
Also maybe we can optimize the process itself or something in the coding.

Thanks in advance!

Kind regards
Wolfgang

Hi Wolfgang,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

  • Please create a standalone Java application (source code without compilation errors) that helps us reproduce your problem on our end and attach it here for testing.
  • Please share the image resolution which you are using in ImageSaveOptions
  • Please share some detail about following code snippet. You are convert wordDocument to image file format. Please share what is the use of blankDoc. Please share some detail about your scenario.
InputStream inputStream = new ByteArrayInputStream(content);
Document blankDoc = new Document(inputStream);
blankDoc.save(imageOutputStream, imageSaveOptions);

Hi Tahir,

Thanks for your feedback.

Please find attached the requested standalone java application for the performance test.

In the bin folder, there is also a runable jar file. Be aware, that a license file has to be put into the same folder.

Also in the zip folder, we included the test document again.

The path used for testing is fixed. You might want to change it before running the application.

The image resolution used is set to 96.0 dpi.
This is also the case for any test we did.

I recognized, that during my local tests the whole conversion took only about one minute. So also I try to get more details about the used server environment.

So far I can say, that our observations of the slow performance are the same for Linux (SLES 12) and Windows (Server 2012).

In regards to your question for the code snippet, I assume that the document is called blankDoc, since it is only used as placeholder to use the save operation and not meant for any further processing.

Cheers and kind regards
Wolfgang

Hi Wolfgang,

Thanks for sharing the detail. We have tested the scenario using latest version of Aspose.Words for .NET and Java at Windows 7 (64 bit) and have not found the shared issue. Aspose.Words takes around 50 seconds for rendering document to Png.

We will test the same scenario at Linux and will share our finding here for your reference.

Hi Wolfgang,

Thanks for your patience. We have tested the scenario at Ubuntu and have not found the shared performance issue. Aspose.Words for Java takes around 80 seconds for your shared code example. We suggest you please use following simplified code example.

Document doc = new Document(MyDir + "bigDoc.docx");
long start = System.currentTimeMillis();
ByteArrayOutputStream imageOutputStream = new ByteArrayOutputStream();
ImageSaveOptions imageSaveOptions = new ImageSaveOptions(SaveFormat.PNG);
imageSaveOptions.setDmlRenderingMode(DmlRenderingMode.DRAWING_ML);
imageSaveOptions.setPageCount(1);
imageSaveOptions.setResolution(96);
for (int pageIndex = 0; pageIndex < doc.getPageCount(); pageIndex++)
{
    imageSaveOptions.setPageIndex(pageIndex);
    doc.save(imageOutputStream, imageSaveOptions);
}
System.out.println("Execution time: " + (System.currentTimeMillis() - start) + " ms");

It is quite difficult to answer such questions because CPU performance and memory usage all depend on complexity and size of the documents you are loading/generating.

In terms of memory, Aspose.Words does not have any limitations. If you’re loading huge Word documents into Aspose.Words’ DOM, more memory would be required. This is because during processing, the document needs to be held wholly in memory.

While rendering a document to fixed page formats (e.g. PDF, Jpeg, Xps), Aspose.Words needs to build two model in the memory – one for document and the other for rendered document.

The process of building layout model is not linear; it may take a minute to render one page and may take a few seconds to render 100 pages. Also, Aspose.Words has to create APS (Aspose Page Specification) model in memory and this may again eat some more time for some documents. Rest assured, we’re always working on improving performance; but, rendering will be always running slower than simple saving to flow formats (e.g doc/docx).

Hi Tahir,
thanks for the feedback.
I created similar tests in .Net and Java and executed each test several times on exactly the same system with not much more load on it.
The observations are:
- Java is about 10 times slower than .Net. (6 seconds vs. 58 - 62 seconds)
- The CPU load during the Java test is continuously high at 50 - 100%.
Tested on a 4 Core Intel i5 CPU with 4 Cores and 12 GB RAM.
This factor is not really negligible. In fact we have major issues on our customers’ systems and no further possibilities to help them.
Also since we are dealing with a distributed environment, some of the Web services are already running into timeouts and we also have to increase these on each system.
I found an article about java performance issues during image processing:
http://www.jhlabs.com/ip/managed_images.html
Could it be, that one or more of the mentioned methods are used inside the Aspose logic?
Would it also be possible to work around that on your side?
Please find attached the .Net part as well.
Thanks and kind regards
Wolfgang

Hi Wolfgang,

Thanks for sharing the detail. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSJAVA-1239. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi,

thank you so much for the answer.

Looking forward to get this solved.

Cheers
Wolfgang

Hi again and happy new year!

Are there any updates on this specific issue regarding the conversion performance?

Thanks and kind regards
Wolfgang

Hi Wolfgang,

This issue (WORDSJAVA-1239) has now been resolved and its fix will be available in next version of Aspose.Words v16.1.0. We will update you via this forum thread once new version of Aspose.Words is published. Thanks for your patience.

Please note that in your code example, you are reading document from stream for each page inside the loop:

Document wordDocument = new Document(stream);
for (int pageIndex = 0; pageIndex < wordDocument.getPageCount(); pageIndex++) {
    …
    InputStream inputStream = new ByteArrayInputStream(content);
    Document blankDoc = new Document(inputStream); // There is no need of this line
    blankDoc.save(imageOutputStream, imageSaveOptions); // We can use ordDocument.save(imageOutputStream, imageSaveOptions) here
}

Without reading the document inside the loop will reduce the execution time. Hope this helps you.

Hi Tahir,

that are good news!!

Thanks you guys so much for the great work!

In regards to your remark - we already did some improvements on our coding, including reading the document outside of the loop, but this brought still not the final high performance as we are used to have from .Net.

Kind regards
Wolfgang

Hi Wolfgang,

Thanks for your feedback.

Without reading the document inside the loop your code takes around 24-25 seconds. After this fix, the execution time will be nearly 14-15 seconds. We will inform you via this forum thread once new version of Aspose.Words is published. Please let us know if you have any more queries.

Hi again,

the only question remaining is - when can we expect the new version of Aspose.Words?

Thanks and cheers
Wolfgang

Hi Wolfgang,

Thanks for your inquiry. Hopefully, the next version of Aspose.Words will be available at the start of next month (February 2016). We will inform you via this forum thread once new version of Aspose.Words is published. Thanks for your patience.

The issues you have found earlier (filed as WORDSJAVA-1239) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Hi again,

thank you so much for this fix!!

By now we are getting times around 15 seconds, which is a big time performance impact compared to before.

Kind regards
Wolfgang

Hi Wolfgang,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.