Docx to pdf conversion low performance

Hi,
I am testing Aspose Words (16.3.0 Evaluation version) to convert DOCX to PDF. The conversion works fine but i have performance issue. What could be the reason?

-------------------- Code ------------------------------
long startTime = System.nanoTime();
Document doc = new Document(filePath);
doc.save(filePath_des);
System.err.println("Total time : "+Math.round((System.nanoTime()-startTime)/1e6));
-----------------------------------------------------


PC conf: i7 CPU - 8 GB ram - SSD disk- win7 (64 bit)

file: 18kb 2 pages msdocx file (file attached)

Result
results are ms.

1 thread
5020

concurrent 5 thread
6440
6461
6469
6472
6482

concurrent 10 thread
7613
7704
7721
7782
7800
7828
7866
7914
7920
7926

concurrent 15 thread
8892
9018
9074
9113
9159
9182
9293
9293
9379
9408
9452
9480
9488
9488
9489

concurrent 20 thread

9011
9073
9094
9146
9181
9241
9356
9389
9390
9408
9448
9522
9527
9531
9531
9533
9537
9537
9538
9539

Thanks & regards.

Hi,


Please create a standalone simple Java application (source code without compilation errors) that helps us reproduce your problem on our end and attach it here for testing. As soon as you get this simple application ready, we’ll start further investigation into your issue and provide you more information.

Best regards,

Hi Awais,

Test project added to question

Thanks.
Hi,

Thanks for your inquiry. We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system as WORDSJAVA-1369. Our product team will further look into the details of this problem and we will keep you updated on the status of correction. We apologize for your inconvenience.

Best regards,
Hi

Thanks for your interest in our library.

Could you please be more specific on the performance issue you mentioned above?

As you probably aware microbenchmarking could be very tricky and not always represent real-world usage patterns.
For instance, your test shows following numbers on my machine
(i7-4710HQ 2.50Ghz 16GB RAM Win10 (64-bit) 4 Cores, SSD, JDK1.8_51, Aspose.Words 16.3)

1 thread
Thread: 12, time: 1352

concurrent 5 thread
Thread: 12, time: 1662
Thread: 14, time: 1663
Thread: 15, time: 1663
Thread: 16, time: 1666
Thread: 13, time: 1694

concurrent 10 thread
Thread: 16, time: 1898
Thread: 12, time: 1902
Thread: 20, time: 1904
Thread: 21, time: 1905
Thread: 18, time: 1907
Thread: 15, time: 1908
Thread: 14, time: 1909
Thread: 17, time: 1909
Thread: 19, time: 1913
Thread: 13, time: 1916

concurrent 20 thread
Thread: 13, time: 2164
Thread: 24, time: 2166
Thread: 31, time: 2167
Thread: 26, time: 2168
Thread: 20, time: 2177
Thread: 29, time: 2202
Thread: 28, time: 2206
Thread: 15, time: 2208
Thread: 18, time: 2211
Thread: 12, time: 2218
Thread: 19, time: 2219
Thread: 23, time: 2219
Thread: 27, time: 2219
Thread: 14, time: 2223
Thread: 22, time: 2223
Thread: 25, time: 2223
Thread: 17, time: 2224
Thread: 21, time: 2224
Thread: 16, time: 2226
Thread: 30, time: 2228


If you have some time please have a look at these good articles


and the video

Also I recommend to try JMH
It takes care of many things (e.g. warm up) and it will help you to test throughput and responsiveness with almost no additional effort.

JMH (10 threads) average time is ~140 ms / op
JMH (1 thread) average time is ~48 ms / op

Please let us know if we can help with anything else.

Thanks.

@SMGTEAM,

Regarding WORDSJAVA-1369, our product team has completed the work on your issue and has come to a conclusion that this issue and the undesired performance behavior you are observing is actually not a bug in Aspose.Words for Java. So, we have closed this issue as ‘Not a Bug’. Please see the following details:

There are two basic ways to measure the parallel performance of a given application, depending on whether or not one is CPU-bound or memory-bound. These are referred to as strong and weak scaling respectively.

Strong Scaling - In this case the problem size stays fixed but the number of processing elements are increased. The strong scaling test tells you how the parallel overhead scales with different number of processors.

Weak Scaling - In this case the problem size (workload) assigned to each processing element stays constant and additional elements are used to solve a larger total problem. A weak scaling test fixes the amount of work per processor and compares the execution time over number of processors. The weak scaling test tells you something weaker – whether the parallel overhead varies faster or slower than the amount of work.

The demo application you shared processes single document per thread. All the documents are the same. This is also good because we can hit the same locks. Threads can read a document without blocking.

Having regard to the above we can say that you are performing weak scaling testing. In order to calculate Weak Scaling Efficiency we should use following formula:

If the amount of time to complete a work unit with 1 processing element is t1, and the amount of time to complete N of the same work units with N processing elements is tN, the weak scaling efficiency (as a percentage of linear) is given as:

( t1 / tN ) * 100%

Since each processor has the same amount to do, in the ideal case the execution time should remain constant. (see weak_scaling.png (35.4 KB))

Threads    17.6    16.8    17.6    16.8
1    40.978    39.733    100%    100%
2    41.926    39.957    98%    99%
4    67.99    67.965    60%    58%
8    117.322    131.316    35%    30%
16    249.704    253.983    18%    16%
32    423.809    379.011    11%    10%
64    868.073    755.003    5%    5%

“Good” Weak Scaling Efficiency is a relative term. The only absolute reference point is ideal scaling (100% efficiency). If we use as many threads as we have processor cores, then it doesn’t look so bad. In other cases we would consider the efficiency is poor. It is important to note that 17.6 shows the better result than 16.8.

After profiling (see profiler.png (118.8 KB)) we noticed that the same methods are being blocked all the time.

CurrentThread.getCurrentCulture()
FontSettings.getFont()
SectPr.addCultureDefaults()

According to YourKit the thread is being blocked on average 40% of the time. At this point we hoped there was a chance to increase efficiency by 20%-30%. We removed/replaced synchronized block with ConcurrentHashMap (Aspose.Words for Java works incorrectly, but we would like to see maximum speedup) see replace_remove_all_synchronized.png (123.8 KB)

Threads    17.6    17.6
1    40.978    100%
2    43.714    94%
4    58.352    70%    <- up to 10% not bad, but I doubt we can achieve this result if make AW work
8    119.061    34%
16    231.784    18%
32    428.358    10%
64    649.759    6%

So the picture is almost the same. But we unfortunately could not get rid of all the synchronized blocks so the numbers would be pretty close to the version with all the synchronized blocks.

Also, please see YourKit snapshots and plots charts_1369.png (43.7 KB)

So, we conclude the following two points:

  1. There is no performance degradation on 17.6 since 16.8.
  2. The full image of performance/threads-number ratio is quite standard for complex application (that uses disk IO and other system resources). We think, after 2 threads the application just meets disk IO blocks.

Best regards,
Awais Hafeez