Unexpected Scalability results in Java Fork-Join (Java 8)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected Scalability results in Java Fork-Join (Java 8)

steadria
Dear all,

Recently, I was running some scalability experiments using Java
Fork-Join. Here, I used the non-default ForkJoinPool constructor
`ForkJoinPool(int parallelism)`, passing the desired parallelism level
(# workers = P) as argument.

Specifically, using the piece of code attached, I got these results on a
processor with 4 physical and 8 logical cores (Using java 8:
jre1.8.0_45):

T1: 11730
T2: 2381 (speedup: 4,93)
T4: 2463 (speedup: 4,76)
T8: 2418 (speedup: 4,85)

While when using java 7 (jre1.7.0), I get

T1: 11938
T2: 11843 (speedup: 1,01)
T4: 5133 (speedup: 2,33)
T8: 2607 (speedup: 4,58)

(where TP is the execution time in ms, using parallelism level P)

While both results surprise me, the latter I can understand (the join
will cause 1 worker (executing the loop) to block, as it
fails to recognize that it could, while waiting, process other pending
dummy tasks from its local queue). The former, however, got me puzzled.

Running further experiments on a 64-core SMP machine (jdk1.8.0_45),
using the JMH benchmarking tool (= 1 fork, 50 iterations (+ 50 warmup))

I got the results below

T1: 23.831

   23.831 ±(99.9%) 0.116 s/op [Average]
   (min, avg, max) = (23.449, 23.831, 24.522), stdev = 0.234
   CI (99.9%): [23.715, 23.947] (assumes normal distribution)


T2: 2.927 (speedup: 8.14)

   2.927 ±(99.9%) 0.091 s/op [Average]
   (min, avg, max) = (2.655, 2.927, 3.405), stdev = 0.184
   CI (99.9%): [2.836, 3.018] (assumes normal distribution)

T64: 1.550 (speedup: 15.37)

   1.550 ±(99.9%) 0.027 s/op [Average]
   (min, avg, max) = (1.460, 1.550, 1.786), stdev = 0.054
   CI (99.9%): [1.523, 1.577] (assumes normal distribution)


My current theory:

I guess one explanation would be that the worker executing the parallel
loop does not go idle in java 8, but instead finds other work to
perform. Furthermore, I suspect there might be a 'bug' in this
mechanism, which causes more workers to be active (i.e. consuming
resources) than the desired level of parallelism (P) passed as
constructor argument, explaining the super-linear speedup observed.

I was wondering whether someone of you has a better/other explanation?
Clearly the use of the java FJ framework in code attached is not 100%
kosher, however to my knowledge it doesn't violate any of the
framework's preconditions either?! Note that scalability results are 'as
expected', when dummy tasks are joined in reverse order.

I really appreciate any help you can provide,

Steven Adriaensen
PhD Student
Vrije Universiteit Brussel
Brussels, Belgium
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

MinimalExample.java (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected Scalability results in Java Fork-Join (Java 8)

Kirk Pepperdine
You might find that Hotspot does a better job optimizing things in 1.8. I’ll be interested in running your bench comparing it to mine.

— Kirk

On Jul 28, 2015, at 9:40 AM, steadria <[hidden email]> wrote:

> Dear all,
>
> Recently, I was running some scalability experiments using Java Fork-Join. Here, I used the non-default ForkJoinPool constructor `ForkJoinPool(int parallelism)`, passing the desired parallelism level (# workers = P) as argument.
>
> Specifically, using the piece of code attached, I got these results on a processor with 4 physical and 8 logical cores (Using java 8: jre1.8.0_45):
>
> T1: 11730
> T2: 2381 (speedup: 4,93)
> T4: 2463 (speedup: 4,76)
> T8: 2418 (speedup: 4,85)
>
> While when using java 7 (jre1.7.0), I get
>
> T1: 11938
> T2: 11843 (speedup: 1,01)
> T4: 5133 (speedup: 2,33)
> T8: 2607 (speedup: 4,58)
>
> (where TP is the execution time in ms, using parallelism level P)
>
> While both results surprise me, the latter I can understand (the join will cause 1 worker (executing the loop) to block, as it
> fails to recognize that it could, while waiting, process other pending dummy tasks from its local queue). The former, however, got me puzzled.
>
> Running further experiments on a 64-core SMP machine (jdk1.8.0_45), using the JMH benchmarking tool (= 1 fork, 50 iterations (+ 50 warmup))
>
> I got the results below
>
> T1: 23.831
>
>  23.831 ±(99.9%) 0.116 s/op [Average]
>  (min, avg, max) = (23.449, 23.831, 24.522), stdev = 0.234
>  CI (99.9%): [23.715, 23.947] (assumes normal distribution)
>
>
> T2: 2.927 (speedup: 8.14)
>
>  2.927 ±(99.9%) 0.091 s/op [Average]
>  (min, avg, max) = (2.655, 2.927, 3.405), stdev = 0.184
>  CI (99.9%): [2.836, 3.018] (assumes normal distribution)
>
> T64: 1.550 (speedup: 15.37)
>
>  1.550 ±(99.9%) 0.027 s/op [Average]
>  (min, avg, max) = (1.460, 1.550, 1.786), stdev = 0.054
>  CI (99.9%): [1.523, 1.577] (assumes normal distribution)
>
>
> My current theory:
>
> I guess one explanation would be that the worker executing the parallel loop does not go idle in java 8, but instead finds other work to perform. Furthermore, I suspect there might be a 'bug' in this mechanism, which causes more workers to be active (i.e. consuming resources) than the desired level of parallelism (P) passed as constructor argument, explaining the super-linear speedup observed.
>
> I was wondering whether someone of you has a better/other explanation? Clearly the use of the java FJ framework in code attached is not 100% kosher, however to my knowledge it doesn't violate any of the framework's preconditions either?! Note that scalability results are 'as expected', when dummy tasks are joined in reverse order.
>
> I really appreciate any help you can provide,
>
> Steven Adriaensen
> PhD Student
> Vrije Universiteit Brussel
> Brussels, Belgium<MinimalExample.java>_______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (507 bytes) Download Attachment