Regarding WORDSJAVA-1369, our product team has completed the work on your issue and has come to a conclusion that this issue and the undesired performance behavior you are observing is actually not a bug in Aspose.Words for Java. So, we have closed this issue as ‘Not a Bug’. Please see the following details:
There are two basic ways to measure the parallel performance of a given application, depending on whether or not one is CPU-bound or memory-bound. These are referred to as strong and weak scaling respectively.
Strong Scaling - In this case the problem size stays fixed but the number of processing elements are increased. The strong scaling test tells you how the parallel overhead scales with different number of processors.
Weak Scaling - In this case the problem size (workload) assigned to each processing element stays constant and additional elements are used to solve a larger total problem. A weak scaling test fixes the amount of work per processor and compares the execution time over number of processors. The weak scaling test tells you something weaker – whether the parallel overhead varies faster or slower than the amount of work.
The demo application you shared processes single document per thread. All the documents are the same. This is also good because we can hit the same locks. Threads can read a document without blocking.
Having regard to the above we can say that you are performing weak scaling testing. In order to calculate Weak Scaling Efficiency we should use following formula:
If the amount of time to complete a work unit with 1 processing element is t1, and the amount of time to complete N of the same work units with N processing elements is tN, the weak scaling efficiency (as a percentage of linear) is given as:
( t1 / tN ) * 100%
Since each processor has the same amount to do, in the ideal case the execution time should remain constant. (see weak_scaling.png (35.4 KB))
Threads 17.6 16.8 17.6 16.8
1 40.978 39.733 100% 100%
2 41.926 39.957 98% 99%
4 67.99 67.965 60% 58%
8 117.322 131.316 35% 30%
16 249.704 253.983 18% 16%
32 423.809 379.011 11% 10%
64 868.073 755.003 5% 5%
“Good” Weak Scaling Efficiency is a relative term. The only absolute reference point is ideal scaling (100% efficiency). If we use as many threads as we have processor cores, then it doesn’t look so bad. In other cases we would consider the efficiency is poor. It is important to note that 17.6 shows the better result than 16.8.
After profiling (see profiler.png (118.8 KB)) we noticed that the same methods are being blocked all the time.
According to YourKit the thread is being blocked on average 40% of the time. At this point we hoped there was a chance to increase efficiency by 20%-30%. We removed/replaced synchronized block with ConcurrentHashMap (Aspose.Words for Java works incorrectly, but we would like to see maximum speedup) see replace_remove_all_synchronized.png (123.8 KB)
Threads 17.6 17.6
1 40.978 100%
2 43.714 94%
4 58.352 70% <- up to 10% not bad, but I doubt we can achieve this result if make AW work
8 119.061 34%
16 231.784 18%
32 428.358 10%
64 649.759 6%
So the picture is almost the same. But we unfortunately could not get rid of all the synchronized blocks so the numbers would be pretty close to the version with all the synchronized blocks.
Also, please see YourKit snapshots and plots charts_1369.png (43.7 KB)
So, we conclude the following two points:
- There is no performance degradation on 17.6 since 16.8.
- The full image of performance/threads-number ratio is quite standard for complex application (that uses disk IO and other system resources). We think, after 2 threads the application just meets disk IO blocks.