JDK 9's compareAndSet vs compareAndExchange

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

JDK 9's compareAndSet vs compareAndExchange

Dávid Karnok
JDK 9's VarHandle API (and AtomicXXX classes) specify the new  compareAndExchange() method. Traditionally, I wrote CAS loops like this:

public static long getAddAndCap(AtomicLong requested, long n) {
    for (;;) {
        long current = requested.get();

        if (current == Long.MAX_VALUE) {
           return Long.MAX_VALUE;
        }
        long next = current + n;
        if (next < 0L) {
            next = Long.MAX_VALUE;
        }
        if (requested.compareAndSet(current, next)) {
            return current;
        }
    }
}

Now I can write this:

public static long getAddAndCap(AtomicLong requested, long n) {
    long current = requested.get();
    for (;;) {

        if (current == Long.MAX_VALUE) {
           return Long.MAX_VALUE;
        }
        long next = current + n;
        if (next < 0L) {
            next = Long.MAX_VALUE;
        }
        long actual = requested.compareAndExchange(current, next);
        if (actual == current) {
           return current;
        }
        current = actual;
    }
}

I'm not sure I could JMH benchmark these under JDK 9 now so my question is whether the latter pattern has lower overhead (on x86) since it reuses the value returned by the underlying LOCK CMPXCHG instead of re-reading the target field again (or the JIT was always smart enough to detect the unnecessary re-read with compareAndSet; or the hardware is smart enough by doing speculative reads in this case to hide its latency ?).

--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Vitaly Davidovich


On Thu, Sep 22, 2016 at 9:35 AM, Dávid Karnok <[hidden email]> wrote:
JDK 9's VarHandle API (and AtomicXXX classes) specify the new  compareAndExchange() method. Traditionally, I wrote CAS loops like this:

public static long getAddAndCap(AtomicLong requested, long n) {
    for (;;) {
        long current = requested.get();

        if (current == Long.MAX_VALUE) {
           return Long.MAX_VALUE;
        }
        long next = current + n;
        if (next < 0L) {
            next = Long.MAX_VALUE;
        }
        if (requested.compareAndSet(current, next)) {
            return current;
        }
    }
}

Now I can write this:

public static long getAddAndCap(AtomicLong requested, long n) {
    long current = requested.get();
    for (;;) {

        if (current == Long.MAX_VALUE) {
           return Long.MAX_VALUE;
        }
        long next = current + n;
        if (next < 0L) {
            next = Long.MAX_VALUE;
        }
        long actual = requested.compareAndExchange(current, next);
        if (actual == current) {
           return current;
        }
        current = actual;
    }
}

I'm not sure I could JMH benchmark these under JDK 9 now so my question is whether the latter pattern has lower overhead (on x86) since it reuses the value returned by the underlying LOCK CMPXCHG instead of re-reading the target field again (or the JIT was always smart enough to detect the unnecessary re-read with compareAndSet; or the hardware is smart enough by doing speculative reads in this case to hide its latency ?).
I don't see how the JIT could've avoided the "re-read" since the semantics are different; you're basically asking whether it pattern matched a CAS+get(), assumed the intention was CAE, and fused that as CAE internally.  I'm pretty sure the answer is no.

It's very likely the subsequent get() is a L1 hitting load (unless there's very serious contention and the cacheline is invalidated again), so it shouldn't be too bad.  However, additional atomic ops per loop iteration will serve as barriers for JIT optimizations, which may play some role in effective performance.  Although in your particular example above, if CAS/CAE fails multiple times, your performance will likely be dominated by the associated coherence traffic.

--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Aleksey Shipilev-3
In reply to this post by Dávid Karnok
Hi,

Somewhat counter-intuitively, reusing compareAndExchange return value as
the basis for retry is slower than re-reading actual value for
compareAndSet. Because under contention, the overall progress depends on
the width on the "collision window" for your RMW operation:
 https://bugs.openjdk.java.net/browse/JDK-8141640

compareAndExchange is useful when you *have* to have the "witness value"
against which you failed. Choosing compareAndExchange over compareAndSet
within the loop is futile. Choosing compareAndExchange *instead* of
compareAndSet + re-reads is sane, e.g. in
https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L583

Thanks,
-Aleksey

On 09/22/2016 03:35 PM, Dávid Karnok wrote:

> JDK 9's VarHandle API (and AtomicXXX classes) specify the new
>  compareAndExchange() method. Traditionally, I wrote CAS loops like this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     for (;;) {
>         long current = requested.get();
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         if (requested.compareAndSet(current, next)) {
>             return current;
>         }
>     }
> }
>
> Now I can write this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     long current = requested.get();
>     for (;;) {
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         long actual = requested.compareAndExchange(current, next);
>         if (actual == current) {
>            return current;
>         }
>         current = actual;
>     }
> }
>
> I'm not sure I could JMH benchmark these under JDK 9 now so my question
> is whether the latter pattern has lower overhead (on x86) since it
> reuses the value returned by the underlying LOCK CMPXCHG instead of
> re-reading the target field again (or the JIT was always smart enough to
> detect the unnecessary re-read with compareAndSet; or the hardware is
> smart enough by doing speculative reads in this case to hide its latency ?).
>
> --
> Best regards,
> David Karnok
>
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Vitaly Davidovich


On Thu, Sep 22, 2016 at 10:20 AM, Aleksey Shipilev <[hidden email]> wrote:
Hi,

Somewhat counter-intuitively, reusing compareAndExchange return value as
the basis for retry is slower than re-reading actual value for
compareAndSet. Because under contention, the overall progress depends on
the width on the "collision window" for your RMW operation:
 https://bugs.openjdk.java.net/browse/JDK-8141640
Interesting result.  What CPU was this test done on, if you recall? The # of cycles in the "collision window" in that benchmark is fairly small (just a bunch of cheap ALU instructions and register-to-register movs that ought to be renamed).  Did you look at any PMU counters that would back up the theory?

Thanks


compareAndExchange is useful when you *have* to have the "witness value"
against which you failed. Choosing compareAndExchange over compareAndSet
within the loop is futile. Choosing compareAndExchange *instead* of
compareAndSet + re-reads is sane, e.g. in
https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L583

Thanks,
-Aleksey

On 09/22/2016 03:35 PM, Dávid Karnok wrote:
> JDK 9's VarHandle API (and AtomicXXX classes) specify the new
>  compareAndExchange() method. Traditionally, I wrote CAS loops like this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     for (;;) {
>         long current = requested.get();
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         if (requested.compareAndSet(current, next)) {
>             return current;
>         }
>     }
> }
>
> Now I can write this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     long current = requested.get();
>     for (;;) {
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         long actual = requested.compareAndExchange(current, next);
>         if (actual == current) {
>            return current;
>         }
>         current = actual;
>     }
> }
>
> I'm not sure I could JMH benchmark these under JDK 9 now so my question
> is whether the latter pattern has lower overhead (on x86) since it
> reuses the value returned by the underlying LOCK CMPXCHG instead of
> re-reading the target field again (or the JIT was always smart enough to
> detect the unnecessary re-read with compareAndSet; or the hardware is
> smart enough by doing speculative reads in this case to hide its latency ?).
>
> --
> Best regards,
> David Karnok
>
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Dávid Karnok
Thanks for the information.

FYI, I do have places where compareAndExchange is useful and not part of a loop:

public static boolean setOnce(VarHandle h, Object instance, Subscription s) {
    Subscription o = (Subscription)h.compareAndExchange(instance, null, s);
    if (o != null) {
        s.cancel();
        if (o != CANCELLED) {
            CatchAll.onError(new IllegalStateException("Subscription already set!"));
        }
        return false;
    }
    return true;
}



2016-09-22 17:06 GMT+02:00 Vitaly Davidovich <[hidden email]>:


On Thu, Sep 22, 2016 at 10:20 AM, Aleksey Shipilev <[hidden email]> wrote:
Hi,

Somewhat counter-intuitively, reusing compareAndExchange return value as
the basis for retry is slower than re-reading actual value for
compareAndSet. Because under contention, the overall progress depends on
the width on the "collision window" for your RMW operation:
 https://bugs.openjdk.java.net/browse/JDK-8141640
Interesting result.  What CPU was this test done on, if you recall? The # of cycles in the "collision window" in that benchmark is fairly small (just a bunch of cheap ALU instructions and register-to-register movs that ought to be renamed).  Did you look at any PMU counters that would back up the theory?

Thanks


compareAndExchange is useful when you *have* to have the "witness value"
against which you failed. Choosing compareAndExchange over compareAndSet
within the loop is futile. Choosing compareAndExchange *instead* of
compareAndSet + re-reads is sane, e.g. in
https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L583

Thanks,
-Aleksey

On 09/22/2016 03:35 PM, Dávid Karnok wrote:
> JDK 9's VarHandle API (and AtomicXXX classes) specify the new
>  compareAndExchange() method. Traditionally, I wrote CAS loops like this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     for (;;) {
>         long current = requested.get();
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         if (requested.compareAndSet(current, next)) {
>             return current;
>         }
>     }
> }
>
> Now I can write this:
>
> public static long getAddAndCap(AtomicLong requested, long n) {
>     long current = requested.get();
>     for (;;) {
>
>         if (current == Long.MAX_VALUE) {
>            return Long.MAX_VALUE;
>         }
>         long next = current + n;
>         if (next < 0L) {
>             next = Long.MAX_VALUE;
>         }
>         long actual = requested.compareAndExchange(current, next);
>         if (actual == current) {
>            return current;
>         }
>         current = actual;
>     }
> }
>
> I'm not sure I could JMH benchmark these under JDK 9 now so my question
> is whether the latter pattern has lower overhead (on x86) since it
> reuses the value returned by the underlying LOCK CMPXCHG instead of
> re-reading the target field again (or the JIT was always smart enough to
> detect the unnecessary re-read with compareAndSet; or the hardware is
> smart enough by doing speculative reads in this case to hide its latency ?).
>
> --
> Best regards,
> David Karnok
>
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Aleksey Shipilev-3
On 09/23/2016 09:11 AM, Dávid Karnok wrote:

> Thanks for the information.
>
> FYI, I do have places where compareAndExchange is useful and not part of
> a loop:
>
> public static boolean setOnce(VarHandle h, Object instance, Subscription
> s) {
>     Subscription o = (Subscription)h.compareAndExchange(instance, null, s);
>     if (o != null) {
>         s.cancel();
>         if (o != CANCELLED) {
>             CatchAll.onError(new IllegalStateException("Subscription
> already set!"));
>         }
>         return false;
>     }
>     return true;
> }
This seems like a fair use.

Not sure this is significantly better than compareAndSet + re-read on
failure, because the state changes seem only monotonic (e.g. from null
to one canonical instance), and re-read would not produce false results.

The caveat for VarHandles though is, $h is better to come from the
(static final) constant, otherwise you will have lots of unfolded checks
in VH mechanics, which may affect performance even more.

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Dávid Karnok
This seems like a fair use.

Not sure this is significantly better than compareAndSet + re-read on
failure, because the state changes seem only monotonic (e.g. from null
to one canonical instance), and re-read would not produce false results.


The code supposed to detect multiple calls to a setOnce. If the witness is not the CANCELLED instance, that is considered a bug (detected by looping a bit in unit tests - not perfect but helped many times).
 
The caveat for VarHandles though is, $h is better to come from the
(static final) constant, otherwise you will have lots of unfolded checks
in VH mechanics, which may affect performance even more.


I was considering asking this separately. Former code used either AtomicReferenceFieldUpdater + instance or AtomicReference as input parameter. I was under the assumption that the method would get inlined and the $h is $this.field and $instance becomes $this.

public static <T> boolean setOnce(AtomicReferenceFieldUpdater<T, Flow.Subscription> h, T instance, Flow.Subscription s) {
    if (h.compareAndSet(instance, null, s)) {
        s.cancel();
        if (h.get(instance) != CANCELLED) {
            CatchAll.onError(new IllegalStateException("Subscription already set!"));
        }
        return false;
    }
    return true;
}

final class SomeOperator implements Flow.Subscriber<Object> {
    volatile Flow.Subscription s;
    static final AtomicReferenceFieldUpdater<SomeOperator, Flow.Subscription> S
        = AtomicReferenceFieldUpdater.newUpdater(SomeOperator.class, Flow.Subscription.class, "s");

   public void onSubscribe(Flow.Subscription s) {
       if (SubscriptionHelper.setOnce(S, this, s)) {
           s.request(Long.MAX_VALUE);
       }
   }

   // ...
}



 
Thanks,
-Aleksey




--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Aleksey Shipilev-3


On 09/23/2016 11:36 AM, Dávid Karnok wrote:
>     The caveat for VarHandles though is, $h is better to come from the
>     (static final) constant, otherwise you will have lots of unfolded checks
>     in VH mechanics, which may affect performance even more.
>
>
> I was considering asking this separately. Former code used either
> AtomicReferenceFieldUpdater + instance or AtomicReference as input
> parameter. I was under the assumption that the method would get inlined
> and the $h is $this.field and $instance becomes $this.

Yes, that should happen. But on off-chance inlining breaks, it is a good
idea to have the accessor methods closer to A*FU/VH fields, so that you
can reference them directly. This is a common practice for using A*FU
(and by extension, VHs). Passing things around may run into surprises.

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: JDK 9's compareAndSet vs compareAndExchange

Dávid Karnok
Yes, that should happen. But on off-chance inlining breaks, it is a good
idea to have the accessor methods closer to A*FU/VH fields, so that you
can reference them directly. This is a common practice for using A*FU
(and by extension, VHs). Passing things around may run into surprises.


Thanks. (Too bad the utility method call appears 200+ times in my codebase along with other similar helper methods).


--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest