Low-latency pause in JDK

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Low-latency pause in JDK

JSR166 Concurrency mailing list
Hey,

Is there any jdk-builtin Java8+ method which tries to be clever about low-nanos/micros parking?

I'm currently considering LockSupport.parkNanos but want to avoid having the Thread parked when parking + wake-up latency is more likely to be much greater than the requested time.

I.e. some combination of onSpinWait + some non-cache-polluting computation + yielding + actual parking. I'd like to avoid having to custom-roll it, hence the question for prior art ;)

--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
I haven't seen anything yet if not on some initial implementations of the fork join pool, lately removed. If you need something that is more aware of the OS behaviour eg timeslack_ns probably is something you need to implement by yourself afaik :( (see https://github.com/JCTools/JCTools/pull/248#pullrequestreview-248613337)

Il ven 25 ott 2019, 19:14 Viktor Klang via Concurrency-interest <[hidden email]> ha scritto:
Hey,

Is there any jdk-builtin Java8+ method which tries to be clever about low-nanos/micros parking?

I'm currently considering LockSupport.parkNanos but want to avoid having the Thread parked when parking + wake-up latency is more likely to be much greater than the requested time.

I.e. some combination of onSpinWait + some non-cache-polluting computation + yielding + actual parking. I'd like to avoid having to custom-roll it, hence the question for prior art ;)

--
Cheers,
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
To not mention that you need to consider counted loops/safepoint polls in the equation

Il ven 25 ott 2019, 19:26 Francesco Nigro <[hidden email]> ha scritto:
I haven't seen anything yet if not on some initial implementations of the fork join pool, lately removed. If you need something that is more aware of the OS behaviour eg timeslack_ns probably is something you need to implement by yourself afaik :( (see https://github.com/JCTools/JCTools/pull/248#pullrequestreview-248613337)

Il ven 25 ott 2019, 19:14 Viktor Klang via Concurrency-interest <[hidden email]> ha scritto:
Hey,

Is there any jdk-builtin Java8+ method which tries to be clever about low-nanos/micros parking?

I'm currently considering LockSupport.parkNanos but want to avoid having the Thread parked when parking + wake-up latency is more likely to be much greater than the requested time.

I.e. some combination of onSpinWait + some non-cache-polluting computation + yielding + actual parking. I'd like to avoid having to custom-roll it, hence the question for prior art ;)

--
Cheers,
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
Sounds like a good reason for a JDK-official method for it…

On Fri, Oct 25, 2019 at 10:28 AM Francesco Nigro <[hidden email]> wrote:
To not mention that you need to consider counted loops/safepoint polls in the equation

Il ven 25 ott 2019, 19:26 Francesco Nigro <[hidden email]> ha scritto:
I haven't seen anything yet if not on some initial implementations of the fork join pool, lately removed. If you need something that is more aware of the OS behaviour eg timeslack_ns probably is something you need to implement by yourself afaik :( (see https://github.com/JCTools/JCTools/pull/248#pullrequestreview-248613337)

Il ven 25 ott 2019, 19:14 Viktor Klang via Concurrency-interest <[hidden email]> ha scritto:
Hey,

Is there any jdk-builtin Java8+ method which tries to be clever about low-nanos/micros parking?

I'm currently considering LockSupport.parkNanos but want to avoid having the Thread parked when parking + wake-up latency is more likely to be much greater than the requested time.

I.e. some combination of onSpinWait + some non-cache-polluting computation + yielding + actual parking. I'd like to avoid having to custom-roll it, hence the question for prior art ;)

--
Cheers,
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
+100 totally agree!

Il ven 25 ott 2019, 20:30 Viktor Klang <[hidden email]> ha scritto:
Sounds like a good reason for a JDK-official method for it…

On Fri, Oct 25, 2019 at 10:28 AM Francesco Nigro <[hidden email]> wrote:
To not mention that you need to consider counted loops/safepoint polls in the equation

Il ven 25 ott 2019, 19:26 Francesco Nigro <[hidden email]> ha scritto:
I haven't seen anything yet if not on some initial implementations of the fork join pool, lately removed. If you need something that is more aware of the OS behaviour eg timeslack_ns probably is something you need to implement by yourself afaik :( (see https://github.com/JCTools/JCTools/pull/248#pullrequestreview-248613337)

Il ven 25 ott 2019, 19:14 Viktor Klang via Concurrency-interest <[hidden email]> ha scritto:
Hey,

Is there any jdk-builtin Java8+ method which tries to be clever about low-nanos/micros parking?

I'm currently considering LockSupport.parkNanos but want to avoid having the Thread parked when parking + wake-up latency is more likely to be much greater than the requested time.

I.e. some combination of onSpinWait + some non-cache-polluting computation + yielding + actual parking. I'd like to avoid having to custom-roll it, hence the question for prior art ;)

--
Cheers,
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:

>
> Is there any jdk-builtin Java8+ method which tries to be clever
> about low-nanos/micros parking?
>
> I'm currently considering LockSupport.parkNanos but want to avoid
> having the Thread parked when parking + wake-up latency is more
> likely to be much greater than the requested time.
>
> I.e. some combination of onSpinWait + some non-cache-polluting
> computation + yielding + actual parking. I'd like to avoid having to
> custom-roll it, hence the question for prior art ;)

As I understand it, the common wisdom is to wait for about half the
round-trip time for a system call and then park. It doesn't sound
terribly hard to write something to do that.

Please forgive me for digressing, but:

Arm has a mechanism to do this, WFE. When a core fails to obtain a
lock it executes a WFE instruction which waits on the cache line
containing the lock. When that cache line is written to by the core
releasing the lock it awakens the waiting core.

I'd like to find some way to expose this in a high-level language but
it's not at all easy to do.

I believe that Intel has MWAIT which is similar, but it's a privileged
instruction so no use to us.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
I was hoping that for parking times <  granularity offered by the OS would make sense a mm_pause (possibility not broken) spin loop while using rdts to measure the elapsed time (+ lfence, if needed, but maybe with the pause isn't necessary). And no safepoint polls in the spin wait loop. 
I know that on oversubscribed systems (more active threads then core) isn't a great solution but would be nice to have some way to perform a low latency sleep.
Implementing by myself saving the safepoint poll to be injected in an uncounted loop is not that trivial...

Il sab 26 ott 2019, 11:23 Andrew Haley via Concurrency-interest <[hidden email]> ha scritto:
On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:
>
> Is there any jdk-builtin Java8+ method which tries to be clever
> about low-nanos/micros parking?
>
> I'm currently considering LockSupport.parkNanos but want to avoid
> having the Thread parked when parking + wake-up latency is more
> likely to be much greater than the requested time.
>
> I.e. some combination of onSpinWait + some non-cache-polluting
> computation + yielding + actual parking. I'd like to avoid having to
> custom-roll it, hence the question for prior art ;)

As I understand it, the common wisdom is to wait for about half the
round-trip time for a system call and then park. It doesn't sound
terribly hard to write something to do that.

Please forgive me for digressing, but:

Arm has a mechanism to do this, WFE. When a core fails to obtain a
lock it executes a WFE instruction which waits on the cache line
containing the lock. When that cache line is written to by the core
releasing the lock it awakens the waiting core.

I'd like to find some way to expose this in a high-level language but
it's not at all easy to do.

I believe that Intel has MWAIT which is similar, but it's a privileged
instruction so no use to us.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
I mean RDTSC, missed the final C :P

Il sab 26 ott 2019, 12:07 Francesco Nigro <[hidden email]> ha scritto:
I was hoping that for parking times <  granularity offered by the OS would make sense a mm_pause (possibility not broken) spin loop while using rdts to measure the elapsed time (+ lfence, if needed, but maybe with the pause isn't necessary). And no safepoint polls in the spin wait loop. 
I know that on oversubscribed systems (more active threads then core) isn't a great solution but would be nice to have some way to perform a low latency sleep.
Implementing by myself saving the safepoint poll to be injected in an uncounted loop is not that trivial...

Il sab 26 ott 2019, 11:23 Andrew Haley via Concurrency-interest <[hidden email]> ha scritto:
On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:
>
> Is there any jdk-builtin Java8+ method which tries to be clever
> about low-nanos/micros parking?
>
> I'm currently considering LockSupport.parkNanos but want to avoid
> having the Thread parked when parking + wake-up latency is more
> likely to be much greater than the requested time.
>
> I.e. some combination of onSpinWait + some non-cache-polluting
> computation + yielding + actual parking. I'd like to avoid having to
> custom-roll it, hence the question for prior art ;)

As I understand it, the common wisdom is to wait for about half the
round-trip time for a system call and then park. It doesn't sound
terribly hard to write something to do that.

Please forgive me for digressing, but:

Arm has a mechanism to do this, WFE. When a core fails to obtain a
lock it executes a WFE instruction which waits on the cache line
containing the lock. When that cache line is written to by the core
releasing the lock it awakens the waiting core.

I'd like to find some way to expose this in a high-level language but
it's not at all easy to do.

I believe that Intel has MWAIT which is similar, but it's a privileged
instruction so no use to us.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
Given that the strategy will depend on the runtime environment, a JDK "intrinsic" would make most sense to me—hence wanting something in the JDK :)

On Sat, Oct 26, 2019 at 10:23 AM Francesco Nigro <[hidden email]> wrote:
I mean RDTSC, missed the final C :P

Il sab 26 ott 2019, 12:07 Francesco Nigro <[hidden email]> ha scritto:
I was hoping that for parking times <  granularity offered by the OS would make sense a mm_pause (possibility not broken) spin loop while using rdts to measure the elapsed time (+ lfence, if needed, but maybe with the pause isn't necessary). And no safepoint polls in the spin wait loop. 
I know that on oversubscribed systems (more active threads then core) isn't a great solution but would be nice to have some way to perform a low latency sleep.
Implementing by myself saving the safepoint poll to be injected in an uncounted loop is not that trivial...

Il sab 26 ott 2019, 11:23 Andrew Haley via Concurrency-interest <[hidden email]> ha scritto:
On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:
>
> Is there any jdk-builtin Java8+ method which tries to be clever
> about low-nanos/micros parking?
>
> I'm currently considering LockSupport.parkNanos but want to avoid
> having the Thread parked when parking + wake-up latency is more
> likely to be much greater than the requested time.
>
> I.e. some combination of onSpinWait + some non-cache-polluting
> computation + yielding + actual parking. I'd like to avoid having to
> custom-roll it, hence the question for prior art ;)

As I understand it, the common wisdom is to wait for about half the
round-trip time for a system call and then park. It doesn't sound
terribly hard to write something to do that.

Please forgive me for digressing, but:

Arm has a mechanism to do this, WFE. When a core fails to obtain a
lock it executes a WFE instruction which waits on the cache line
containing the lock. When that cache line is written to by the core
releasing the lock it awakens the waiting core.

I'd like to find some way to expose this in a high-level language but
it's not at all easy to do.

I believe that Intel has MWAIT which is similar, but it's a privileged
instruction so no use to us.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
x86 MWAIT is not available but we could simulate this with XACQUIRE and
PAUSE.  The thread uses XACQUIRE on the cache line and then executes
PAUSE.  PAUSE takes 100s of cycles.  Hopefully, XACQUIRE will wake up
the thread from PAUSE.

The downside of pausing the threads execution of instructions is that
the thread cannot respond to stop the world events.  This will increase
the time it takes to stop the world.  If we are talking about 1000s of
cycles, this might not make much of a difference.  On the other hand,
with GC pause times lower than 1 ms, 1000s of cycles might be a
significant portion of time.

On x86, it takes about 3,000 cycles to enter and return from a
System.yield() call on Windows (on a mid-range laptop processor from 8
years ago).  Any low-latency pause loop has to take into account that if
it waits 3,000 cycles, then it would have been better to enter the
kernel in the first place.  Blocking in the kernel will reduce power
consumption as well as allow other threads to do useful work.  Thus,
each call site needs to keep statistics on how long the thread waits. 
If the call site is waiting too long too often, then the threads should
immediately block in the kernel instead of spinning.  This is not easy
to get right.

Perhaps, a better solution is to provide low-level mechanisms in the JDK
and let people experiment with how long to spin or wait.

-Nathan

On 10/26/2019 3:21 AM, Andrew Haley via Concurrency-interest wrote:

> On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:
>> Is there any jdk-builtin Java8+ method which tries to be clever
>> about low-nanos/micros parking?
>>
>> I'm currently considering LockSupport.parkNanos but want to avoid
>> having the Thread parked when parking + wake-up latency is more
>> likely to be much greater than the requested time.
>>
>> I.e. some combination of onSpinWait + some non-cache-polluting
>> computation + yielding + actual parking. I'd like to avoid having to
>> custom-roll it, hence the question for prior art ;)
> As I understand it, the common wisdom is to wait for about half the
> round-trip time for a system call and then park. It doesn't sound
> terribly hard to write something to do that.
>
> Please forgive me for digressing, but:
>
> Arm has a mechanism to do this, WFE. When a core fails to obtain a
> lock it executes a WFE instruction which waits on the cache line
> containing the lock. When that cache line is written to by the core
> releasing the lock it awakens the waiting core.
>
> I'd like to find some way to expose this in a high-level language but
> it's not at all easy to do.
>
> I believe that Intel has MWAIT which is similar, but it's a privileged
> instruction so no use to us.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/26/19 7:37 AM, Viktor Klang via Concurrency-interest wrote:
> Given that the strategy will depend on the runtime environment, a JDK
> "intrinsic" would make most sense to me—hence wanting something in the
> JDK :)

Considering that a general solution would provide the best course of
action whenever entities (for example you, or some thread) momentarily
cannot get something they momentarily want (a lock, a new JDK method),
this is a hard request to fulfill! The best we've been able to do is
make internal choices (in locks, queues, etc) that are OK with respect
to more concrete contexts. In addition to issues that other people have
mentioned, solutions also interact with choice of garbage collector.

I agree that it might be would be nice to expose some of the lower-level
features available on some platforms that might sometimes provide better
performance, but these also tend to be hard to encapsulate in APIs in
ways that do more good than harm. (For example, I have later removed
nearly every occurrence of Thread.yield() in any code I've written after
finding something better or more general.)

And as always (especially in real life), whenever you encounter a
problem involving blocking, the main question to ask is whether there is
something you can do other than block -- helping, choosing alternate
actions, speculating, etc.

-Doug


>
> On Sat, Oct 26, 2019 at 10:23 AM Francesco Nigro <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I mean RDTSC, missed the final C :P
>
>     Il sab 26 ott 2019, 12:07 Francesco Nigro <[hidden email]
>     <mailto:[hidden email]>> ha scritto:
>
>         I was hoping that for parking times <  granularity offered by
>         the OS would make sense a mm_pause (possibility not broken) spin
>         loop while using rdts to measure the elapsed time (+ lfence, if
>         needed, but maybe with the pause isn't necessary). And no
>         safepoint polls in the spin wait loop. 
>         I know that on oversubscribed systems (more active threads then
>         core) isn't a great solution but would be nice to have some way
>         to perform a low latency sleep.
>         Implementing by myself saving the safepoint poll to be injected
>         in an uncounted loop is not that trivial...
>
>         Il sab 26 ott 2019, 11:23 Andrew Haley via Concurrency-interest
>         <[hidden email]
>         <mailto:[hidden email]>> ha scritto:
>
>             On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest
>             wrote:
>             >
>             > Is there any jdk-builtin Java8+ method which tries to be
>             clever
>             > about low-nanos/micros parking?
>             >
>             > I'm currently considering LockSupport.parkNanos but want
>             to avoid
>             > having the Thread parked when parking + wake-up latency is
>             more
>             > likely to be much greater than the requested time.
>             >
>             > I.e. some combination of onSpinWait + some non-cache-polluting
>             > computation + yielding + actual parking. I'd like to avoid
>             having to
>             > custom-roll it, hence the question for prior art ;)
>
>             As I understand it, the common wisdom is to wait for about
>             half the
>             round-trip time for a system call and then park. It doesn't
>             sound
>             terribly hard to write something to do that.
>
>             Please forgive me for digressing, but:
>
>             Arm has a mechanism to do this, WFE. When a core fails to
>             obtain a
>             lock it executes a WFE instruction which waits on the cache line
>             containing the lock. When that cache line is written to by
>             the core
>             releasing the lock it awakens the waiting core.
>
>             I'd like to find some way to expose this in a high-level
>             language but
>             it's not at all easy to do.
>
>             I believe that Intel has MWAIT which is similar, but it's a
>             privileged
>             instruction so no use to us.
>
>             --
>             Andrew Haley  (he/him)
>             Java Platform Lead Engineer
>             Red Hat UK Ltd. <https://www.redhat.com>
>             https://keybase.io/andrewhaley
>             EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>             _______________________________________________
>             Concurrency-interest mailing list
>             [hidden email]
>             <mailto:[hidden email]>
>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
> --
> Cheers,
> √
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
> For example, I have later removed
nearly every occurrence of Thread.yield() in any code I've written after
finding something better or more general

OT: Just curious, where? :)

Il sab 26 ott 2019, 20:08 Doug Lea via Concurrency-interest <[hidden email]> ha scritto:
On 10/26/19 7:37 AM, Viktor Klang via Concurrency-interest wrote:
> Given that the strategy will depend on the runtime environment, a JDK
> "intrinsic" would make most sense to me—hence wanting something in the
> JDK :)

Considering that a general solution would provide the best course of
action whenever entities (for example you, or some thread) momentarily
cannot get something they momentarily want (a lock, a new JDK method),
this is a hard request to fulfill! The best we've been able to do is
make internal choices (in locks, queues, etc) that are OK with respect
to more concrete contexts. In addition to issues that other people have
mentioned, solutions also interact with choice of garbage collector.

I agree that it might be would be nice to expose some of the lower-level
features available on some platforms that might sometimes provide better
performance, but these also tend to be hard to encapsulate in APIs in
ways that do more good than harm. (For example, I have later removed
nearly every occurrence of Thread.yield() in any code I've written after
finding something better or more general.)

And as always (especially in real life), whenever you encounter a
problem involving blocking, the main question to ask is whether there is
something you can do other than block -- helping, choosing alternate
actions, speculating, etc.

-Doug


>
> On Sat, Oct 26, 2019 at 10:23 AM Francesco Nigro <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I mean RDTSC, missed the final C :P
>
>     Il sab 26 ott 2019, 12:07 Francesco Nigro <[hidden email]
>     <mailto:[hidden email]>> ha scritto:
>
>         I was hoping that for parking times <  granularity offered by
>         the OS would make sense a mm_pause (possibility not broken) spin
>         loop while using rdts to measure the elapsed time (+ lfence, if
>         needed, but maybe with the pause isn't necessary). And no
>         safepoint polls in the spin wait loop. 
>         I know that on oversubscribed systems (more active threads then
>         core) isn't a great solution but would be nice to have some way
>         to perform a low latency sleep.
>         Implementing by myself saving the safepoint poll to be injected
>         in an uncounted loop is not that trivial...
>
>         Il sab 26 ott 2019, 11:23 Andrew Haley via Concurrency-interest
>         <[hidden email]
>         <mailto:[hidden email]>> ha scritto:
>
>             On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest
>             wrote:
>             >
>             > Is there any jdk-builtin Java8+ method which tries to be
>             clever
>             > about low-nanos/micros parking?
>             >
>             > I'm currently considering LockSupport.parkNanos but want
>             to avoid
>             > having the Thread parked when parking + wake-up latency is
>             more
>             > likely to be much greater than the requested time.
>             >
>             > I.e. some combination of onSpinWait + some non-cache-polluting
>             > computation + yielding + actual parking. I'd like to avoid
>             having to
>             > custom-roll it, hence the question for prior art ;)
>
>             As I understand it, the common wisdom is to wait for about
>             half the
>             round-trip time for a system call and then park. It doesn't
>             sound
>             terribly hard to write something to do that.
>
>             Please forgive me for digressing, but:
>
>             Arm has a mechanism to do this, WFE. When a core fails to
>             obtain a
>             lock it executes a WFE instruction which waits on the cache line
>             containing the lock. When that cache line is written to by
>             the core
>             releasing the lock it awakens the waiting core.
>
>             I'd like to find some way to expose this in a high-level
>             language but
>             it's not at all easy to do.
>
>             I believe that Intel has MWAIT which is similar, but it's a
>             privileged
>             instruction so no use to us.
>
>             --
>             Andrew Haley  (he/him)
>             Java Platform Lead Engineer
>             Red Hat UK Ltd. <https://www.redhat.com>
>             https://keybase.io/andrewhaley
>             EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>
>             _______________________________________________
>             Concurrency-interest mailing list
>             [hidden email]
>             <mailto:[hidden email]>
>             http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
> --
> Cheers,
> √
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
On 10/26/19 2:42 PM, Francesco Nigro via Concurrency-interest wrote:
>> For example, I have later removed
> nearly every occurrence of Thread.yield() in any code I've written after
> finding something better or more general
>
> OT: Just curious, where? :)

Notice for example that as of the rewrites last summer, there are none
in all of java.util.concurrent.locks.*

Much further off-topic, I was reminded of this when seeing "Ants Are
Practically Immune to Traffic Jams" courtesy of Hacker News a few days
ago (https://news.ycombinator.com/item?id=21352345)
https://www.sciencealert.com/ant-roads-are-practically-immune-to-traffic-jams-even-when-it-gets-crowded


Perhaps we can devise some pheromone-based synchronization primitives...

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 26/10/2019 11:21, Andrew Haley via Concurrency-interest wrote:

> On 10/25/19 11:11 AM, Viktor Klang via Concurrency-interest wrote:
>>
>> Is there any jdk-builtin Java8+ method which tries to be clever
>> about low-nanos/micros parking?
>>
>> I'm currently considering LockSupport.parkNanos but want to avoid
>> having the Thread parked when parking + wake-up latency is more
>> likely to be much greater than the requested time.
>>
>> I.e. some combination of onSpinWait + some non-cache-polluting
>> computation + yielding + actual parking. I'd like to avoid having to
>> custom-roll it, hence the question for prior art ;)
>
> As I understand it, the common wisdom is to wait for about half the
> round-trip time for a system call and then park. It doesn't sound
> terribly hard to write something to do that.
>
> Please forgive me for digressing, but:
>
> Arm has a mechanism to do this, WFE. When a core fails to obtain a
> lock it executes a WFE instruction which waits on the cache line
> containing the lock. When that cache line is written to by the core
> releasing the lock it awakens the waiting core.
>
> I'd like to find some way to expose this in a high-level language but
> it's not at all easy to do.
>
> I believe that Intel has MWAIT which is similar, but it's a privileged
> instruction so no use to us.
>

Intel is introducing UMWAIT with Tremont

https://www.felixcloutier.com/x86/umwait
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
On 10/26/19 10:56 PM, Aaron Grunthal via Concurrency-interest wrote:
> On 26/10/2019 11:21, Andrew Haley via Concurrency-interest wrote:
>>
>> I'd like to find some way to expose [WFE] in a high-level language but
>> it's not at all easy to do.
>>
>> I believe that Intel has MWAIT which is similar, but it's a privileged
>> instruction so no use to us.
>
> Intel is introducing UMWAIT with Tremont

Excellent! Competition is good. And perhaps once we have hardware we can find
a nice way to handle this.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/26/19 2:52 PM, Nathan and Ila Reynolds via Concurrency-interest wrote:

> The downside of pausing the threads execution of instructions is
> that the thread cannot respond to stop the world events.  This will
> increase the time it takes to stop the world.

Good point.

I don't think that's necessarily true, though. When a stop-the-world
event is needed a common technique is to read-protect a page. At the
hardware level this is [usually?] done by sending an inter-processor
interrupt broadcast to invalidate the TLBs of all processors. This
will kick every processor out of MWAIT/WFE.

On Arm any event which clears the "global monitor" (i.e. the state
machine used by load locked / store conditional) will kick the
processor out of WFE, and both TLB invalidate and exception return
clear the global monitor. I very much suspect Intel will do the same,
but that's not guaranteed.

Also, the longest MWAIT pause time I've ever measured on AArch64
systems is orders of magnitude less than our time to safepoint in
HotSpot. There's no guarantee of that either.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Low-latency pause in JDK

JSR166 Concurrency mailing list
For blocking file I/O, the native method will check the stop-the-world
flag after returning from the OS API call.  GC can then assume these
threads are effectively stopped.

If the "pause" instruction does not wake up the thread immediately due
to a stop-the-world event, we could add similar logic as what is in file
I/O.  However, this will make pause performance suffer.

-Nathan

On 10/30/2019 4:04 AM, Andrew Haley wrote:

> On 10/26/19 2:52 PM, Nathan and Ila Reynolds via Concurrency-interest wrote:
>
>> The downside of pausing the threads execution of instructions is
>> that the thread cannot respond to stop the world events.  This will
>> increase the time it takes to stop the world.
> Good point.
>
> I don't think that's necessarily true, though. When a stop-the-world
> event is needed a common technique is to read-protect a page. At the
> hardware level this is [usually?] done by sending an inter-processor
> interrupt broadcast to invalidate the TLBs of all processors. This
> will kick every processor out of MWAIT/WFE.
>
> On Arm any event which clears the "global monitor" (i.e. the state
> machine used by load locked / store conditional) will kick the
> processor out of WFE, and both TLB invalidate and exception return
> clear the global monitor. I very much suspect Intel will do the same,
> but that's not guaranteed.
>
> Also, the longest MWAIT pause time I've ever measured on AArch64
> systems is orders of magnitude less than our time to safepoint in
> HotSpot. There's no guarantee of that either.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest