Docx to pdf conversion low performance

SMGTEAM · May 4, 2016, 1:23am

Hi,

I am testing Aspose Words (16.3.0 Evaluation version) to convert DOCX to PDF. The conversion works fine but i have performance issue. What could be the reason?

-------------------- Code ------------------------------

long startTime = System.nanoTime();

Document doc = new Document(filePath);

doc.save(filePath_des);

System.err.println("Total time : "+Math.round((System.nanoTime()-startTime)/1e6));

-----------------------------------------------------

PC conf: i7 CPU - 8 GB ram - SSD disk- win7 (64 bit)

file: 18kb 2 pages msdocx file (file attached)

Result

results are ms.

1 thread

5020

concurrent 5 thread

6440

6461

6469

6472

6482

concurrent 10 thread

7613

7704

7721

7782

7800

7828

7866

7914

7920

7926

concurrent 15 thread

8892

9018

9074

9113

9159

9182

9293

9379

9408

9452

9480

9488

9489

concurrent 20 thread

9011

9073

9094

9146

9181

9241

9356

9389

9390

9408

9448

9522

9527

9531

9533

9537

9538

9539

Thanks & regards.

awais.hafeez · May 5, 2016, 1:38am

Hi,

Please create a standalone simple Java application (source code without compilation errors) that helps us reproduce your problem on our end and attach it here for testing. As soon as you get this simple application ready, we’ll start further investigation into your issue and provide you more information.

Best regards,

SMGTEAM · May 5, 2016, 2:32am

Hi Awais,

Test project added to question

Thanks.

awais.hafeez · May 6, 2016, 4:13am

Hi,

Thanks for your inquiry. We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system as WORDSJAVA-1369. Our product team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,

nixspirit · July 11, 2016, 5:39am

Hi

Thanks for your interest in our library.

Could you please be more specific on the performance issue you mentioned above?

As you probably aware microbenchmarking could be very tricky and not always represent real-world usage patterns.

For instance, your test shows following numbers on my machine

(i7-4710HQ 2.50Ghz 16GB RAM Win10 (64-bit) 4 Cores, SSD, JDK1.8_51, Aspose.Words 16.3)

1 thread

Thread: 12, time: 1352

concurrent 5 thread

Thread: 12, time: 1662

Thread: 14, time: 1663

Thread: 15, time: 1663

Thread: 16, time: 1666

Thread: 13, time: 1694

concurrent 10 thread

Thread: 16, time: 1898

Thread: 12, time: 1902

Thread: 20, time: 1904

Thread: 21, time: 1905

Thread: 18, time: 1907

Thread: 15, time: 1908

Thread: 14, time: 1909

Thread: 17, time: 1909

Thread: 19, time: 1913

Thread: 13, time: 1916

concurrent 20 thread

Thread: 13, time: 2164

Thread: 24, time: 2166

Thread: 31, time: 2167

Thread: 26, time: 2168

Thread: 20, time: 2177

Thread: 29, time: 2202

Thread: 28, time: 2206

Thread: 15, time: 2208

Thread: 18, time: 2211

Thread: 12, time: 2218

Thread: 19, time: 2219

Thread: 23, time: 2219

Thread: 27, time: 2219

Thread: 14, time: 2223

Thread: 22, time: 2223

Thread: 25, time: 2223

Thread: 17, time: 2224

Thread: 21, time: 2224

Thread: 16, time: 2226

Thread: 30, time: 2228

If you have some time please have a look at these good articles

http://www.ibm.com/developerworks/library/j-jtp12214/

http://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html

http://nadeausoftware.com/articles/2008/03/java_tip_how_get_cpu_and_user_time_benchmarking

and the video

https://vimeo.com/78900556

Also I recommend to try JMH

It takes care of many things (e.g. warm up) and it will help you to test throughput and responsiveness with almost no additional effort.

JMH (10 threads) average time is ~140 ms / op

JMH (1 thread) average time is ~48 ms / op

Please let us know if we can help with anything else.

Thanks.

awais.hafeez · June 29, 2017, 12:03pm

@SMGTEAM,

Regarding WORDSJAVA-1369, our product team has completed the work on your issue and has come to a conclusion that this issue and the undesired performance behavior you are observing is actually not a bug in Aspose.Words for Java. So, we have closed this issue as ‘Not a Bug’. Please see the following details:

There are two basic ways to measure the parallel performance of a given application, depending on whether or not one is CPU-bound or memory-bound. These are referred to as strong and weak scaling respectively.

Strong Scaling - In this case the problem size stays fixed but the number of processing elements are increased. The strong scaling test tells you how the parallel overhead scales with different number of processors.

Weak Scaling - In this case the problem size (workload) assigned to each processing element stays constant and additional elements are used to solve a larger total problem. A weak scaling test fixes the amount of work per processor and compares the execution time over number of processors. The weak scaling test tells you something weaker – whether the parallel overhead varies faster or slower than the amount of work.

The demo application you shared processes single document per thread. All the documents are the same. This is also good because we can hit the same locks. Threads can read a document without blocking.

Having regard to the above we can say that you are performing weak scaling testing. In order to calculate Weak Scaling Efficiency we should use following formula:

If the amount of time to complete a work unit with 1 processing element is t1, and the amount of time to complete N of the same work units with N processing elements is tN, the weak scaling efficiency (as a percentage of linear) is given as:

( t1 / tN ) * 100%

Since each processor has the same amount to do, in the ideal case the execution time should remain constant. (see weak_scaling.png (35.4 KB))

Threads    17.6    16.8    17.6    16.8
1    40.978    39.733    100%    100%
2    41.926    39.957    98%    99%
4    67.99    67.965    60%    58%
8    117.322    131.316    35%    30%
16    249.704    253.983    18%    16%
32    423.809    379.011    11%    10%
64    868.073    755.003    5%    5%

“Good” Weak Scaling Efficiency is a relative term. The only absolute reference point is ideal scaling (100% efficiency). If we use as many threads as we have processor cores, then it doesn’t look so bad. In other cases we would consider the efficiency is poor. It is important to note that 17.6 shows the better result than 16.8.

After profiling (see profiler.png (118.8 KB)) we noticed that the same methods are being blocked all the time.

CurrentThread.getCurrentCulture()
FontSettings.getFont()
SectPr.addCultureDefaults()

According to YourKit the thread is being blocked on average 40% of the time. At this point we hoped there was a chance to increase efficiency by 20%-30%. We removed/replaced synchronized block with ConcurrentHashMap (Aspose.Words for Java works incorrectly, but we would like to see maximum speedup) see replace_remove_all_synchronized.png (123.8 KB)

Threads    17.6    17.6
1    40.978    100%
2    43.714    94%
4    58.352    70%    <- up to 10% not bad, but I doubt we can achieve this result if make AW work
8    119.061    34%
16    231.784    18%
32    428.358    10%
64    649.759    6%

So the picture is almost the same. But we unfortunately could not get rid of all the synchronized blocks so the numbers would be pretty close to the version with all the synchronized blocks.

Also, please see YourKit snapshots and plots charts_1369.png (43.7 KB)

So, we conclude the following two points:

There is no performance degradation on 17.6 since 16.8.
The full image of performance/threads-number ratio is quite standard for complex application (that uses disk IO and other system resources). We think, after 2 threads the application just meets disk IO blocks.

Best regards,
Awais Hafeez