Min and Max for Atomics

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Min and Max for Atomics

Nathan and Ila Reynolds
Have the following methods been considered to be added to their
respective classes?  Perhaps, there are other Atomic classes where these
should be added.  Perhaps, VarHandles should have these added (assume I
know nothing about VarHandles).

The advantage of these methods is that they avoid cache invalidation and
a fence if there is no update.  This is kind of related to the previous
discussion about getAndUpdate() and updateAndGet().

I am proposing these methods since I have written this code a few
times.  Until these methods are implemented in the JDK, perhaps I should
write an atomic utility class.

AtomicInteger

public int max(int value)
{
    int expect;

    while (true)
    {
       expect = get();

       if (expect >= value)
          return expect;

       if (compareAndSet(expect, value))
          return expect;
    }
}

public int min(int value);
{
    int expect;

    while (true)
    {
       expect = get();

       if (expect <= value)
          return expect;

       if (compareAndSet(expect, value))
          return expect;
    }
}

AtomicLong
public long max(long value);
public long min(long value);


-- -Nathan
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Nathan and Ila Reynolds
Yes, I get the same behavior, but I will have to pay for a cache
invalidation, CAS and a memory fence with each call.  For example, if I
am tracking a high-water mark then at the beginning the updates should
be very often and then taper off to nothing.  Thus, over time the cost
is reduced to a load from cache or RAM.

-Nathan

On 8/10/2017 10:15 AM, Jonas Konrad wrote:

> Note that you can achieve the same behaviour using updateAndGet(i ->
> Math.max(i, newVal)), at the cost of one CAS even on keeping the
> current value.
>
> Looking at it, the shortcut prev == next could be taken in
> updateAndGet too. Maybe it can't because you need the HB semantics of
> compareAndSet?
>
> - Jonas Konrad
>
>
> On 08/10/2017 01:51 PM, Nathan and Ila Reynolds wrote:
>> Have the following methods been considered to be added to their
>> respective classes?  Perhaps, there are other Atomic classes where
>> these should be added.  Perhaps, VarHandles should have these added
>> (assume I know nothing about VarHandles).
>>
>> The advantage of these methods is that they avoid cache invalidation
>> and a fence if there is no update.  This is kind of related to the
>> previous discussion about getAndUpdate() and updateAndGet().
>>
>> I am proposing these methods since I have written this code a few
>> times.  Until these methods are implemented in the JDK, perhaps I
>> should write an atomic utility class.
>>
>> AtomicInteger
>>
>> public int max(int value)
>> {
>>     int expect;
>>
>>     while (true)
>>     {
>>        expect = get();
>>
>>        if (expect >= value)
>>           return expect;
>>
>>        if (compareAndSet(expect, value))
>>           return expect;
>>     }
>> }
>>
>> public int min(int value);
>> {
>>     int expect;
>>
>>     while (true)
>>     {
>>        expect = get();
>>
>>        if (expect <= value)
>>           return expect;
>>
>>        if (compareAndSet(expect, value))
>>           return expect;
>>     }
>> }
>>
>> AtomicLong
>> public long max(long value);
>> public long min(long value);
>>
>>
>> -- -Nathan
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Andrew Haley
On 10/08/17 18:35, Nathan and Ila Reynolds wrote:
> Yes, I get the same behavior, but I will have to pay for a cache
> invalidation, CAS and a memory fence with each call.  For example, if I
> am tracking a high-water mark then at the beginning the updates should
> be very often and then taper off to nothing.  Thus, over time the cost
> is reduced to a load from cache or RAM.

What is this cache invalidation of which you speak?  After the
AtomicReference.updateAndGet() discussion last time around, it was
clear enough that no such thing was necessary.  And besides that,
the message is clear: use a VarHandle for such things.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Brian S O'Neill-3
In reply to this post by Nathan and Ila Reynolds
Why should these functions be built into the VarHandle class? Are there
any machine-specific instructions that can be targeted? If the
implementation of these methods would always match your suggestion, then
I don't see any reason why the VarHandle class should have them.
Complicating further would be all the various permutations. Should this
use acquire/release semantics? Weak? Volatile? Opaque?

On 2017-08-10 04:51 AM, Nathan and Ila Reynolds wrote:

> Have the following methods been considered to be added to their
> respective classes?  Perhaps, there are other Atomic classes where these
> should be added.  Perhaps, VarHandles should have these added (assume I
> know nothing about VarHandles).
>
> The advantage of these methods is that they avoid cache invalidation and
> a fence if there is no update.  This is kind of related to the previous
> discussion about getAndUpdate() and updateAndGet().
>
> I am proposing these methods since I have written this code a few
> times.  Until these methods are implemented in the JDK, perhaps I should
> write an atomic utility class.
>
> AtomicInteger
>
> public int max(int value)
> {
>     int expect;
>
>     while (true)
>     {
>        expect = get();
>
>        if (expect >= value)
>           return expect;
>
>        if (compareAndSet(expect, value))
>           return expect;
>     }
> }
>
> public int min(int value);
> {
>     int expect;
>
>     while (true)
>     {
>        expect = get();
>
>        if (expect <= value)
>           return expect;
>
>        if (compareAndSet(expect, value))
>           return expect;
>     }
> }
>
> AtomicLong
> public long max(long value);
> public long min(long value);
>
>
> -- -Nathan
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Nathan and Ila Reynolds
In reply to this post by Andrew Haley
See the https://en.wikipedia.org/wiki/MESI_protocol

Every time a core writes to a cache line (normal or atomic), it has to
own the cache line in the exclusive state.  If the cache line is not
already in the exclusive state, then the core has to send an
invalidation message to all other caches in all other cores in the
entire system.  This could mean going to another chip or even 7 other
chips.  This invalidation message removes the cache line from all other
cores.  The invalidating core can stall for a long time if the cache
line is heavily contended.  This looks like 100% CPU usage but very
sluggish progress.

I am very aware of this problem because I had to figure out that this
was what was happening and then optimize some C++ code.  The interesting
part of this story is that as Intel produced 4 newer chips over a period
of 4 years, I had to revisit the code and improve the optimization.  I
tried using a ThreadLocal variable but as the thread migrated to
different cores, then the core had to migrate the cache line via cache
invalidation.  I finally came up with an optimization which no longer
suffers from this problem.  The final solution was to assign a cache
line (i.e. variable) to 1 core.  In other words, it was a CoreLocal
variable, if you will.  This reduced the cache invalidations and I have
not had to revisit this code for 6 years.

So, updateAndGet() suffers from cache invalidation even if a write is
not necessary.  It also suffers from CAS latency and a memory fence.  I
realize that in some cases, this is exactly what one would want to pay
for.  In my case, the updates are not frequent enough to warrant the cost.

-Nathan

On 8/10/2017 11:56 AM, Andrew Haley wrote:

> On 10/08/17 18:35, Nathan and Ila Reynolds wrote:
>> Yes, I get the same behavior, but I will have to pay for a cache
>> invalidation, CAS and a memory fence with each call.  For example, if I
>> am tracking a high-water mark then at the beginning the updates should
>> be very often and then taper off to nothing.  Thus, over time the cost
>> is reduced to a load from cache or RAM.
> What is this cache invalidation of which you speak?  After the
> AtomicReference.updateAndGet() discussion last time around, it was
> clear enough that no such thing was necessary.  And besides that,
> the message is clear: use a VarHandle for such things.
>

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Nathan and Ila Reynolds
In reply to this post by Brian S O'Neill-3
Hmm... you raise a good point for VarHandle.  I speak from a position of
knowing very little about VarHandle.  Perhaps, this logic is better in a
utility class which accepts a VarHandle and the value.

-Nathan

On 8/10/2017 12:18 PM, Brian S O'Neill wrote:

> Why should these functions be built into the VarHandle class? Are
> there any machine-specific instructions that can be targeted? If the
> implementation of these methods would always match your suggestion,
> then I don't see any reason why the VarHandle class should have them.
> Complicating further would be all the various permutations. Should
> this use acquire/release semantics? Weak? Volatile? Opaque?
>
> On 2017-08-10 04:51 AM, Nathan and Ila Reynolds wrote:
>> Have the following methods been considered to be added to their
>> respective classes?  Perhaps, there are other Atomic classes where
>> these should be added.  Perhaps, VarHandles should have these added
>> (assume I know nothing about VarHandles).
>>
>> The advantage of these methods is that they avoid cache invalidation
>> and a fence if there is no update.  This is kind of related to the
>> previous discussion about getAndUpdate() and updateAndGet().
>>
>> I am proposing these methods since I have written this code a few
>> times.  Until these methods are implemented in the JDK, perhaps I
>> should write an atomic utility class.
>>
>> AtomicInteger
>>
>> public int max(int value)
>> {
>>     int expect;
>>
>>     while (true)
>>     {
>>        expect = get();
>>
>>        if (expect >= value)
>>           return expect;
>>
>>        if (compareAndSet(expect, value))
>>           return expect;
>>     }
>> }
>>
>> public int min(int value);
>> {
>>     int expect;
>>
>>     while (true)
>>     {
>>        expect = get();
>>
>>        if (expect <= value)
>>           return expect;
>>
>>        if (compareAndSet(expect, value))
>>           return expect;
>>     }
>> }
>>
>> AtomicLong
>> public long max(long value);
>> public long min(long value);
>>
>>
>> -- -Nathan
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Andrew Haley
In reply to this post by Nathan and Ila Reynolds
On 10/08/17 19:29, Nathan and Ila Reynolds wrote:
> Every time a core writes to a cache line (normal or atomic), it has to
> own the cache line in the exclusive state.  If the cache line is not
> already in the exclusive state, then the core has to send an
> invalidation message to all other caches in all other cores in the
> entire system.  This could mean going to another chip or even 7 other
> chips.  This invalidation message removes the cache line from all other
> cores.  The invalidating core can stall for a long time if the cache
> line is heavily contended.  This looks like 100% CPU usage but very
> sluggish progress.

Oh, I see: you're complaining about the cache line being written and
ping-ponging between all the cores, generating a lot of bus traffic.
Fair enough.  However, it seems to me that your need is fairly specialized
and you've solved it easily enough with a trivial method.

> So, updateAndGet() suffers from cache invalidation even if a write is
> not necessary.  It also suffers from CAS latency and a memory fence.

In theory VarHandles will get you better behaviour, but in practice
HotSpot doesn't yet have the accelerators needed to make this stuff
work without the fences.  It's on my list of things to do.  Mind you,
if you're using x86, the ability to do without the synchronization
won't help much because x86 is TSO anyway.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Min and Max for Atomics

Nathan and Ila Reynolds
 > However, it seems to me that your need is fairly specialized and
you've solved it easily enough with a trivial method.

Yes, I raise this request because I have written this kind of logic
several times.  Perhaps, I should just put it in a utility class.

-Nathan

On 8/10/2017 12:54 PM, Andrew Haley wrote:

> On 10/08/17 19:29, Nathan and Ila Reynolds wrote:
>> Every time a core writes to a cache line (normal or atomic), it has to
>> own the cache line in the exclusive state.  If the cache line is not
>> already in the exclusive state, then the core has to send an
>> invalidation message to all other caches in all other cores in the
>> entire system.  This could mean going to another chip or even 7 other
>> chips.  This invalidation message removes the cache line from all other
>> cores.  The invalidating core can stall for a long time if the cache
>> line is heavily contended.  This looks like 100% CPU usage but very
>> sluggish progress.
> Oh, I see: you're complaining about the cache line being written and
> ping-ponging between all the cores, generating a lot of bus traffic.
> Fair enough.  However, it seems to me that your need is fairly specialized
> and you've solved it easily enough with a trivial method.
>
>> So, updateAndGet() suffers from cache invalidation even if a write is
>> not necessary.  It also suffers from CAS latency and a memory fence.
> In theory VarHandles will get you better behaviour, but in practice
> HotSpot doesn't yet have the accelerators needed to make this stuff
> work without the fences.  It's on my list of things to do.  Mind you,
> if you're using x86, the ability to do without the synchronization
> won't help much because x86 is TSO anyway.
>

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest