ForkJoinPool sometimes blocks an outer on a nested task (Java 9 and later)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ForkJoinPool sometimes blocks an outer on a nested task (Java 9 and later)

JSR166 Concurrency mailing list
Dear Experts

I'm new on this list and not a concurrency expert. I'm the lead of the Scala team at Lightbend which maintains the compiler and standard library.

We recently had a bug report of a parallel collections operation being blocked on the termination of a Future started within the operation body. I managed to reproduce the behavior in a fairly small Java-only example using ForkJoinPool, the code is here: https://gist.github.com/lrytz/a622c541d91336030a8cb788273bdc66

The code starts a task, which forks two subtasks and joins them, the subtasks recursively do the same, until some depth. Each one of these tasks is fast (2ms). Additionally, the first task creates a subtask that is slow (200ms), but does *not* join on it. On Java 8, each top-level task returns quickly. On Java 9 and later, from time to time the top-level task gets blocked on the slow subtask and only returns after 200+ ms. I tested Java 1.8.0_261-b12 for 8, and 9+181, 11.0.6+10, 14.0.1+7, 15+36-1562.

The issue is more common when the pool's parallelism is lower. With a parallelism of 4, I did a 10 seconds flight recording, and there is a pretty obvious difference: on Java 8 there were > 20 pool threads, while on Java 11 there were only 5 of them. The screenshots show the long wait periods on Java 11: https://github.com/scala/bug/issues/12106#issuecomment-701978100.

I did a short search of this list's archive and found one thread that looks somewhat related, though if I understand correctly that older discussion is about tasks that involve some sort of blocking, which is not the case in this example. http://altair.cs.oswego.edu/pipermail/concurrency-interest/2014-May/012669.html

I'd like to know whether the Java 9+ behavior is a bug or within spec - I hope it's an appropriate question for this list. https://docs.oracle.com/javase/9/docs/api/java/util/concurrent/ForkJoinPool.html says:
>  The pool attempts to maintain enough active (or available) threads by dynamically adding, suspending, or resuming internal worker threads, even if some tasks are stalled waiting to join others

Best,
Lukas

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ForkJoinPool sometimes blocks an outer on a nested task (Java 9 and later)

JSR166 Concurrency mailing list
Hello Lukas,

I haven't found the exact lines of code that causes your problem, but basically, it comes from the change to the ForkJoinPool algorithm to create less additonal threads.
While previously calling #join() would create a new side thread to help completing the task being joined, the new algorithm tries in place to consume submitted tasks (I am still looking for the commit that changed that)

The thing is that calling #execute from within a ForkJoinThread, like you do within Test$T, is detected by the code [1], causing the task with the long sleep to be recorded in the same work queue as your other Test$T tasks. And it behaves differently than an ForkJoinPool#externalPush, as it could occurs when called from your main for example.
So when calling #join(), some workers may consider that executing the long-sleep task helps in the join. As a consequence, even if not joining explicitly on it, you do.

Changing Test$T#compute() to the following code highlights that it is what happens.
```java
for (T t : subtasks) {
  final long begin = System.nanoTime();
  t.join();
  long duration = (System.nanoTime() - begin) / 1000_000;
  if (duration > 200) {
    System.out.println("Long sleep picked on a join");
  }
}
```

Not being a maintainer of the FJP, I cannot tell if this is an intended design or some edge-case.

Cheers
Olivier




‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
Le jeudi 1 octobre 2020 14:14, Lukas Rytz via Concurrency-interest <[hidden email]> a écrit :

Dear Experts

I'm new on this list and not a concurrency expert. I'm the lead of the Scala team at Lightbend which maintains the compiler and standard library.

We recently had a bug report of a parallel collections operation being blocked on the termination of a Future started within the operation body. I managed to reproduce the behavior in a fairly small Java-only example using ForkJoinPool, the code is here: https://gist.github.com/lrytz/a622c541d91336030a8cb788273bdc66

The code starts a task, which forks two subtasks and joins them, the subtasks recursively do the same, until some depth. Each one of these tasks is fast (2ms). Additionally, the first task creates a subtask that is slow (200ms), but does *not* join on it. On Java 8, each top-level task returns quickly. On Java 9 and later, from time to time the top-level task gets blocked on the slow subtask and only returns after 200+ ms. I tested Java 1.8.0_261-b12 for 8, and 9+181, 11.0.6+10, 14.0.1+7, 15+36-1562.

The issue is more common when the pool's parallelism is lower. With a parallelism of 4, I did a 10 seconds flight recording, and there is a pretty obvious difference: on Java 8 there were > 20 pool threads, while on Java 11 there were only 5 of them. The screenshots show the long wait periods on Java 11: https://github.com/scala/bug/issues/12106#issuecomment-701978100.

I did a short search of this list's archive and found one thread that looks somewhat related, though if I understand correctly that older discussion is about tasks that involve some sort of blocking, which is not the case in this example. http://altair.cs.oswego.edu/pipermail/concurrency-interest/2014-May/012669.html

I'd like to know whether the Java 9+ behavior is a bug or within spec - I hope it's an appropriate question for this list. https://docs.oracle.com/javase/9/docs/api/java/util/concurrent/ForkJoinPool.html says:
>  The pool attempts to maintain enough active (or available) threads by dynamically adding, suspending, or resuming internal worker threads, even if some tasks are stalled waiting to join others

Best,
Lukas


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest