I've noticed that some concurrent primitives employees mixed strategies before parking threads.
I am aware of the Linux kernel mechanism of pthread_cond_signal/wait and the stealthy issue about the contention around futex buckets, but I would like to understand if there are academic papers or just some article to explain the motivation of such strategies and how the effectiveness is being ensured given the very different OSes and architectures supported.
In addition, I see that FJ has dropped the spin before parking threads, while SynchronousQueue still use it and I don't understand why...
On 2/12/20 2:12 AM, Francesco Nigro via Concurrency-interest wrote:
> Hi folks,
> I've noticed that some concurrent primitives employees mixed strategies
> before parking threads.
Yes. Every policy and mechanism for blocking threads entails tradeoffs
and potential performance bugs. The recent refreshes of AQS and FJ make
these more uniform in more cases, but there are still some others that
need renewed attention. SynchronousQueue is among them. When used in
contexts such as Executors.newCahcedThreadPool, spinning to reduce the
worst latency impacts (see below) for starting new tasks is usually
empirically a good idea, but doing so for simple messaging is usually
empirically a bad idea. This will be improved; most likely by creating
new components better geared for the latter.
While I'm at it, a reminder of the options available when you cannot
immediately proceed due to the actions (or lack thereof) of other
threads. Each has overhead, throughput, latency, threshholding, code
structure, and complexity tradeoffs that have led to decades of
disagreements about the best way to write concurrent software.
* Avoid blocking by helping (as in non-blocking data structures and some
* Avoid blocking by arranging a completion/continuation to be triggered,
and then doing something else (as in CompletableFuture,
CountedCompleter, and other async components).
* Spin or pause waiting to see if you were momentarily unlucky, before
trying further options.
* Find another task to run, save context, perform it and restore context
(as in user-level schedulers including upcoming Loom).
* Pass the problem to the JVM, which entails further choices of which
OS-level blocking primitives to use, and then the OS, which then has
harder issues to consider, such as whether to power down your core.
These are where the worst blocking latencies arise.