spinLoopHint() JEP draft discussion

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

spinLoopHint() JEP draft discussion

Gil Tene
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Hans Boehm
If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Gil Tene
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Nathan Reynolds-2
I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Gil Tene
When comparing spinLoopHint() to Thread.yield(), we're talking about different orders of magnitude, and different motivations.

On the motivation side: A major reason for using spinLoopHint() is to improve the reaction time of a spinning thread (from the time the event it is spinning for actually occurs until it actually reacts to it). Power savings is a another benefit. Thread.yield() doesn't help with either.

On the orders of magnitude side: Thread.yield involves making a system call. This makes it literally 10x+ longer to react than spinning without it, and certainly pulls in the opposite direction of spinLoopHint().  

On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <[hidden email]> wrote:

I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email][hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email][hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (859 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Hans Boehm
My question about spinLoopHint() would be whether it can be defined in a way that it makes it useful across architectures.  I vaguely remember seeing claims that even the x86 instructions are not implemented consistently enough to be easily usable in portable code.  I have no idea (though I probably should) about ARM equivalents or the like.

It also seems to me that unbounded spin loops are almost always a bad idea.  (If you've been spinning for 10 seconds, you should be sleeping instead.  You might even be inadvertently scheduled against the thread you're waiting for.  Since you're waiting anyway, you might as well keep track of how long you've been spinning.)  But the idea here would be that this is the low-level primitive you use if you haven't been spinning for very long?  The alternative is to pass in some indication of how long you've been spinning, and have this yield, or sleep, after a sufficiently long time.

Hans

On Tue, Oct 6, 2015 at 6:41 PM, Gil Tene <[hidden email]> wrote:
When comparing spinLoopHint() to Thread.yield(), we're talking about different orders of magnitude, and different motivations.

On the motivation side: A major reason for using spinLoopHint() is to improve the reaction time of a spinning thread (from the time the event it is spinning for actually occurs until it actually reacts to it). Power savings is a another benefit. Thread.yield() doesn't help with either.

On the orders of magnitude side: Thread.yield involves making a system call. This makes it literally 10x+ longer to react than spinning without it, and certainly pulls in the opposite direction of spinLoopHint().  


On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <[hidden email]> wrote:

I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email][hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email][hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Andrew Haley
On 10/08/2015 06:50 PM, Hans Boehm wrote:

> My question about spinLoopHint() would be whether it can be defined
> in a way that it makes it useful across architectures.  I vaguely
> remember seeing claims that even the x86 instructions are not
> implemented consistently enough to be easily usable in portable
> code.  I have no idea (though I probably should) about ARM
> equivalents or the like.

There are ARM equivalents defined in the architecture, but I don't
know if they're much more than NOPs.

> It also seems to me that unbounded spin loops are almost always a
> bad idea.  (If you've been spinning for 10 seconds, you should be
> sleeping instead.  You might even be inadvertently scheduled against
> the thread you're waiting for.  Since you're waiting anyway, you
> might as well keep track of how long you've been spinning.)  But the
> idea here would be that this is the low-level primitive you use if
> you haven't been spinning for very long?

Right.  I don't speak for Gil, but I don't think anyone is proposing
to do any more than adding this hint to the spin loops that people use
already.

Andrew.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

oleksandr otenko
In reply to this post by Gil Tene
Variable X transitions from value A to value B over time t.

What is the expected reaction time of a spinning thread?

The answer is - it really depends on your cost model.

If you are waiting for X to become B, you may be waiting for up to t units of time. What difference would it make in your cost model, if instead it waited for N% of t more? When N% of t becomes larger than time to switch context, you yield. But this is a selfish model (my wait is more important than letting the others use the CPU).

Alex


On 07/10/2015 02:41, Gil Tene wrote:
When comparing spinLoopHint() to Thread.yield(), we're talking about different orders of magnitude, and different motivations.

On the motivation side: A major reason for using spinLoopHint() is to improve the reaction time of a spinning thread (from the time the event it is spinning for actually occurs until it actually reacts to it). Power savings is a another benefit. Thread.yield() doesn't help with either.

On the orders of magnitude side: Thread.yield involves making a system call. This makes it literally 10x+ longer to react than spinning without it, and certainly pulls in the opposite direction of spinLoopHint().  

On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <[hidden email]> wrote:

I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Andrew Haley
In reply to this post by Nathan Reynolds-2
On 06/10/15 21:15, Nathan Reynolds wrote:

> Some people are very afraid of context switches.  They think that
> context switches are expensive.  This was true of very old Linux
> kernels.  Now a days, it costs 100s of nanoseconds to do a context
> switch.

In practice people don't use threads as much as they could because of
the cost of such switches.

Say you've constructed a block of data and you want to encrypt it
before saving it somewhere.  What most people do today is call
encrypt() synchronously.  But chances are you have cores on the same
machine which are stopped, so you could hand that task to another
core.  But to do that you have to signal to the stopped core, and the
latency between a FUTEX_WAKE and a stopped thread starting is at least
a couple of microseconds.  You can encrypt at about 1ns/byte, so
that's a couple of kbytes of encryption just to wake the thread.  And
of course there's the cache overhead too.

In practice, all this latency means that it's not worth waking another
core unless your block of data is pretty large.  So how do you solve
this problem?  You spin.  And then the time to start a waiting thread
is not a couple of microseconds but tens of nanoseconds, the time it
takes to encrypt tens of bytes.

Andrew.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

thurstonn
In reply to this post by Gil Tene
How exactly does this work?
My understanding (very, very limited), was that MWAIT works with a memory address, pseudo:
"continue execution upon a write to memory location X" ,
but the proposed spinLoopHint() doesn't take any argument.

Is the idea that the JIT would somehow figure out the memory address in question?

e.g., I looked at your SpinHintTests, how would the runtime "know" that #spinData was the memory address to monitor?
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Andrew Haley
On 11/10/15 17:42, thurstonn wrote:
> How exactly does this work?
> My understanding (very, very limited), was that MWAIT works with a memory
> address, pseudo:
> "continue execution upon a write to memory location X" ,
> but the proposed spinLoopHint() doesn't take any argument.

spinLoopHint() is just a PAUSE instruction.  It's not an MWAIT.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

oleksandr otenko
In reply to this post by oleksandr otenko

On 10/10/2015 16:54, Gil Tene wrote:

On Oct 9, 2015, at 3:33 PM, Oleksandr Otenko <[hidden email]> wrote:

Variable X transitions from value A to value B over time t.

For applications that care about latency, the question would be phrased as "Variable X transitions from value A to value B *at* time t."

ok, I'll rephrase my statement: "Variable X transitions from value A to value B over/during time dt." :-)

It doesn't matter what the absolute value of t is. But if you observe value is not B, you are going to wait up to dt units of time - owing to the nature of the transition from A to B. Then in this world there is also some overhead from observing it is now B. Saying "make that overhead as small as possible" is not accurate. Saying "make that overhead less than 100 nanoseconds" is too strict - why would you care whether it is 100 nanoseconds, if dt is 10 milliseconds.

Granted, there will be cases where you'd justify the "100 nanosecond" overhead even if dt is "10 ms", hence my remark that it really depends on what the cost function is, but the main consumer of concurrency primitives will want to relax the overhead to be some function of dt - since the average wait time is already a function of dt.



What is the expected reaction time of a spinning thread?

The answer is - it really depends on your cost model.

If you are waiting for X to become B, you may be waiting for up to t units of time. What difference would it make in your cost model, if instead it waited for N% of t more? When N% of t becomes larger than time to switch context, you yield.

Imagine a person working at the supermarket checkout counter that says "I've been standing here for the past 30 minutes and no customer has come to my line, so what difference would it make if I step away for 5 minutes for a smoke?".

If we continue this analogy long enough, they do leave the checkout (I doubt it is for a smoke - more like to stack shelves), and the peers press the button to summon them back when getting congested.

You may be able to optimize the levels of adrenaline in the customer's bloodstream, if they see the cashier race to the checkout (instead of leisurely walk).

How long you've been waiting for X to become B has nothing to do with the reaction time you are allowed to have when it does.

A less important point - let's define reaction time.

At the moment I am looking at it like so: we can't measure the time between events in two different threads, so we have a third timeline observing events in both threads. But there is no "reaction time" on it:

|   |   |
X   A   |
|   |   XA+
|   |   |
|   B   |
|   |   B+
|   |   |
B-  |   |
|   |   |
Y   |   |
|   |   Y+
|   |   |

Suppose for simplicity that X "started the wait to make progress" and A "started transition" are observed simultaneously at the point XA+. Suppose B "finished transition" is observed at B+. Suppose Y "responded to transition to B" is observed at point Y+. Suppose Y+ also tells the observer the time dy between B- "noticed B" and Y. Here "concurrency overhead" is perceived as (Y+)-dy-(B+). It is independent of XA+ and Y+, but whether it makes sense to reduce it really depends on the magnitude of Y-X, an estimate of which is (Y+)-(XA+), and on the cost function or SLA.

You might say that "concurrency overhead" is the "reaction time", but it really is two or more "reaction times", even if you make the thread transitioning A to B the observer thread, instead of having a separate observer thread.

A more important point - reducing the overheads makes sense when they constitute an important part of the overall time.

Maybe you are promoting the "the wait time is so expensive" case. Adding support for that is a good thing. But most cases would want some back off according to cumulative wait time.


Alex


But this is a selfish model (my wait is more important than letting the others use the CPU).

Selfishness is in the eye of the beholder. From a reaction time point of view, yielding would be the selfish thing (like going out for a smoke in the middle of your shift). Applications that are measured by their reaction time behavior (which is true for most applications) can usually justify the computer resources they own/rent/use. They not there to "share with others", they are there to do a job and do it well.

And while spinning can certainly be used for doing stupid and wasteful things for no good reason, the same can be said about linked lists. Applications that spin cpus instead of blocking often have great reasons for doing so.


Alex


On 07/10/2015 02:41, Gil Tene wrote:
When comparing spinLoopHint() to Thread.yield(), we're talking about different orders of magnitude, and different motivations.

On the motivation side: A major reason for using spinLoopHint() is to improve the reaction time of a spinning thread (from the time the event it is spinning for actually occurs until it actually reacts to it). Power savings is a another benefit. Thread.yield() doesn't help with either.

On the orders of magnitude side: Thread.yield involves making a system call. This makes it literally 10x+ longer to react than spinning without it, and certainly pulls in the opposite direction of spinLoopHint().  

On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <[hidden email]> wrote:

I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Justin Sampson
In reply to this post by Andrew Haley
Andrew Haley wrote:

> On 11/10/15 17:42, thurstonn wrote:
>
> > How exactly does this work?
> > My understanding (very, very limited), was that MWAIT works with
> > a memory address, pseudo:
> > "continue execution upon a write to memory location X" ,
> > but the proposed spinLoopHint() doesn't take any argument.
>
> spinLoopHint() is just a PAUSE instruction.  It's not an MWAIT.

Somewhere along the way, Doug had mentioned MWAIT as a different but
related concept:  PAUSE is to yield() as MWAIT is to park().

(And yes, the specific proposal for spinLoopHint() is to use PAUSE.)

Cheers,
Justin

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Andrew Haley
On 12/10/15 21:38, Justin Sampson wrote:

> Andrew Haley wrote:
>
>> On 11/10/15 17:42, thurstonn wrote:
>>
>>> How exactly does this work?
>>> My understanding (very, very limited), was that MWAIT works with
>>> a memory address, pseudo:
>>> "continue execution upon a write to memory location X" ,
>>> but the proposed spinLoopHint() doesn't take any argument.
>>
>> spinLoopHint() is just a PAUSE instruction.  It's not an MWAIT.
>
> Somewhere along the way, Doug had mentioned MWAIT as a different but
> related concept:  PAUSE is to yield() as MWAIT is to park().

That was me, really: I'm looking for a nice way to handle WFE on
AArch64 and mentioned it on the HotSpot list.

Hans's reference to Synchronic objects is interesting but I can't
quite see how to make it fit Java.  I'm wondering if a flyweight
version of park() with a timeout might do the job, but it's not
perfect because you can't communicate any information through a
synchronization value.  Still, it would be faster than what we
have at the moment.

Andrew.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: spinLoopHint() JEP draft discussion

Hans Boehm
In reply to this post by Hans Boehm
It seems to me that the trick here is to be explicit as to what is intended.  Presumably this is intended to discourage speculative execution across a spinLoopHint().  It is not intended to, for example, put the processor into some sort of sleep state for a while, though that might also make sense under slightly different circumstances.

I would emphasize that this is expected not to increase latency.  It might happen to reduce power consumption, but a power-reducing, latency-increasing implementation is not expected.

On Sat, Oct 10, 2015 at 8:41 AM, Gil Tene <[hidden email]> wrote:

On Oct 8, 2015, at 10:50 AM, Hans Boehm <[hidden email]> wrote:

My question about spinLoopHint() would be whether it can be defined in a way that it makes it useful across architectures.  I vaguely remember seeing claims that even the x86 instructions are not implemented consistently enough to be easily usable in portable code.

The PAUSE instruction on x86 has been around and used consistently since Pentium 4s. And pretty much anything spinning (including the JVM's own C++ spinning code) uses it across all x86 architectures. (It encodes in a way that makes it a NOP for pre-Pentium 4 x86, so its harmless at worst).

  I have no idea (though I probably should) about ARM equivalents or the like.

It does not seem to be common practice to use a pure spin loop hinting instruction on ARM in spin loops. On ARMv8 (64 bit) spinning uses WFE/SEVL instructions, which do more than hint. They actually watch a specific memory location for change. See discussion in several e-mails on the thread with the same subject on OpenJDK core-libs-dev archives about that.

It also seems to me that unbounded spin loops are almost always a bad idea.

The hidden OS guy in me always feels that way. But in today's many-core world it is hard to argue with the many practical uses of dedicated and unbounded user-mode spinning. From kernel bypass networking stacks to messaging stacks to trading applications, it is VERY common to find a server continually spinning on a handful of cores these days. And it provides metric benefits to the applications that do so. These include many applications written in (and doing their spinning logic) in Java.

(If you've been spinning for 10 seconds, you should be sleeping instead.

Not if what you care about is the reaction time to the next message. Many applications care about latency (sometimes down to the sub-usec levels) even when messages only come in at 100/sec. And unbounded spinning improves latency across the board (not just the long tails, but even the medium) for such use cases.

You might even be inadvertently scheduled against the thread you're waiting for.

That's what is always dangerous about user-mode spinning (even the bounded kind). But there are many practical ways to prevent this from happening (or prevent it "enough") on modern many-core machines. Just keeping your active thread counts well below your vcore count is a pretty simple way to start for this, and with a modern 2 socket x86 server having anywhere from 24 to 72 vcores these days, thats a pretty practical thing to do. The true latency sensitive folks out there will do a lot to control which cores they spin on, and who might interfere with those cores (e.g. see this detailed Strageloop presentation by Mark Price from LMAX: https://www.youtube.com/watch?v=-6nrhSdu--s (discussion of core-affiny controls starts around 16:00 in the video). LMAX do a lot of spinning in Java…).

  Since you're waiting anyway, you might as well keep track of how long you've been spinning.)  But the idea here would be that this is the low-level primitive you use if you haven't been spinning for very long?

A spinHintLoop is useful for both short spinning (spinning for a while before giving up and blocking) and in indefinite spinning, nd both cases will benefit from it.

  The alternative is to pass in some indication of how long you've been spinning, and have this yield, or sleep, after a sufficiently long time.

I don't see much urgency for adding convenience wrappers, as this logic is doable without adding a Java SE APIs. In fact, it is common to see this in code that performs some sort of indefinite spinning logic.

spinLoopHint() is needed because it provides a currently missing feature. Without it there is (currently) no way for Java spinning logic to make use of important hardware capabilities that improve execution metrics (latency, power consumption, and overall program throughout). Those capabilities are in near-universal use outside of Java for good reason, and Java just lacks a way to indicate the need or intent in a practical way (and JNI call or a yield() is not practical due to the dramatic relative cost difference)...


Hans

On Tue, Oct 6, 2015 at 6:41 PM, Gil Tene <[hidden email]> wrote:
When comparing spinLoopHint() to Thread.yield(), we're talking about different orders of magnitude, and different motivations.

On the motivation side: A major reason for using spinLoopHint() is to improve the reaction time of a spinning thread (from the time the event it is spinning for actually occurs until it actually reacts to it). Power savings is a another benefit. Thread.yield() doesn't help with either.

On the orders of magnitude side: Thread.yield involves making a system call. This makes it literally 10x+ longer to react than spinning without it, and certainly pulls in the opposite direction of spinLoopHint().  


On Oct 6, 2015, at 1:15 PM, Nathan Reynolds <[hidden email]> wrote:

I am not fully up to speed on this topic.  However, why not call Thread.yield()?  If there are no other threads waiting to get on the processor, then Thread.yield() does nothing.  The current thread keeps executing.  If there are threads waiting to get on the processor, then current thread goes to the end of the run queue and another thread gets on the processor (i.e. a context switch).  The thread will run again after the other threads ahead of it either block, call yield() or use up their time slice.  The only time Thread.yield() will do anything is if *all* of the processors are busy (i.e. 100% CPU utilization for the machine).  You could run 1000s of threads in tight Thread.yield() loops and all of the threads will take a turn to go around the loop one time and then go to the end of the run queue.

I've tested this on Windows and Linux (Intel 64-bit processors).

Some people are very afraid of context switches.  They think that context switches are expensive.  This was true of very old Linux kernels.  Now a days, it costs 100s of nanoseconds to do a context switch.  Of course, the cache may need to be reloaded with the data relevant for the running thread.
-Nathan
On 10/6/2015 11:56 AM, Gil Tene wrote:
A variant of synchronic for j.u.c would certainly be cool to have. Especially if it supports a hint that makes it actually spin forever rather than block (this may be what expect_urgent means, or maybe a dedicated spin level is needed). An implementation could use spinLoopHint() under the hood, or other things where appropriate (e.g. if MWAIT was usefully available in user mode in some future, and had a way to limit the wait time).

However, an abstraction like synchronic is a bit higher level than spinLoopHint(). One of the main drivers for spinLoopHint() is direct-use cases by programs and libraries outside of the core JDK. E.g. spinning indefinitely (or for limited periods) on dedicated vcores is a common practice in high performance messaging and communications stacks, as is not unreasonable on today's many-core systems. E.g. seeing 4-8 threads "pinned" with spinning loops is common place in trading applications, in kernel bypass network stacks, and in low latency messaging. And the conditions for spins are often more complicated than those expressible by synchronic (e.g. watching multiple addresses in a mux'ed spin). I'm sure a higher level abstraction for a spin wait can be enriched enough to come close, but there are many current use cases that aren't covered by any currently proposed abstraction.

So, I like the idea of an abstraction that would allow uncomplicated spin-wait use, but I also think that direct access to spinLoopHint() is very much needed. They don't contradict each other.

— Gil.

On Oct 6, 2015, at 9:49 AM, Hans Boehm <[hidden email][hidden email]> wrote:

If you haven't seen it, you may also be interested in


which seems to be a very different perspective on roughly the same space.

On Tue, Oct 6, 2015 at 8:11 AM, Gil Tene <[hidden email][hidden email]> wrote:
I posted a draft JEP about adding spinLoopHint() for discussion on core-libs-dev and hotspot-dev. May be of interest to this group. The main focus is supporting outside-of-the-JDK spinning needs (for which there are multiple eager users), but it could/may be useful under the hood in j.u.c.

http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-October/035613.html

See draft JEP, tests, and links to prototype JDKs to play with here:
https://github.com/giltene/GilExamples/tree/master/SpinHintTest

— Gil.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest