Does factoring out VarHandle-based manipulations cause performance penalties?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Does factoring out VarHandle-based manipulations cause performance penalties?

Dávid Karnok
In my codebase, targeting Java 9, I often have to perform the same set of atomic operations on fields of various classes, for example, a deferred cancellation of Flow.Subscriptions:

Flow.Subscription upstream;
static final VarHandle UPSTREAM;

@Override
public void cancel() {
    Flow.Subscription a = (Flow.Subscription)UPSTREAM.getAcquire(this);
    if (a != CancelledSubscription.INSTANCE) {
        a = (Flow.Subscription)UPSTREAM.getAndSet(this, CancelledSubscription.INSTANCE);
        if (a != null && a != CancelledSubscription.INSTANCE) {
            a.cancel();
        }
    }
}

Refactored into:

final class SubscriptionHelper {

    public static void cancel(Object target, VarHandle handle) {
        Flow.Subscription a = (Flow.Subscription)handle.getAcquire(target);
        if (a != CancelledSubscription.INSTANCE) {
            a = (Flow.Subscription)handle.getAndSet(target, CancelledSubscription.INSTANCE);
            if (a != null && a != CancelledSubscription.INSTANCE) {
                a.cancel();
            }
        }
    }
}

@Override
public void cancel() {
    SubscriptionHelper.cancel(this, UPSTREAM);
}


I'd think JIT can and will inline SubscriptionHelper.cancel to all its use sites, but the fact that the cancel method no longer has "this" but an arbitrary target Object, my concern is that the optimizations may not happen.

I haven't noticed any performance penalties so far but I remember Aleksey Shipilev mentioning somewhere, some time ago, a warning about such out-of-context VarHandle uses.

--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Aleksey Shipilev-3
On 08/09/2017 10:17 AM, Dávid Karnok wrote:

> In my codebase, targeting Java 9, I often have to perform the same set of atomic operations on
> fields of various classes, for example, a deferred cancellation of Flow.Subscriptions:
>
> Flow.Subscription upstream;
> static final VarHandle UPSTREAM;
>
> @Override
> public void cancel() {
>     Flow.Subscription a = (Flow.Subscription)UPSTREAM.getAcquire(this);
>     if (a != CancelledSubscription.INSTANCE) {
>         a = (Flow.Subscription)UPSTREAM.getAndSet(this, CancelledSubscription.INSTANCE);
>         if (a != null && a != CancelledSubscription.INSTANCE) {
>             a.cancel();
>         }
>     }
> }
>
> Refactored into:
>
> final class SubscriptionHelper {
>
>     public static void cancel(Object target, VarHandle handle) {
>         Flow.Subscription a = (Flow.Subscription)handle.getAcquire(target);
>         if (a != CancelledSubscription.INSTANCE) {
>             a = (Flow.Subscription)handle.getAndSet(target, CancelledSubscription.INSTANCE);
>             if (a != null && a != CancelledSubscription.INSTANCE) {
>                 a.cancel();
>             }
>         }
>     }
> }
>
> @Override
> public void cancel() {
>     SubscriptionHelper.cancel(this, UPSTREAM);
> }
>
>
> I'd think JIT can and will inline SubscriptionHelper.cancel to all its use sites, but the fact that
> the cancel method no longer has "this" but an arbitrary target Object, my concern is that the
> optimizations may not happen.
>
> I haven't noticed any performance penalties so far but I remember Aleksey Shipilev mentioning
> somewhere, some time ago, a warning about such out-of-context VarHandle uses.
Like with Unsafe, like with Atomic*FieldUpdaters, like with *Handles in general, the compiler's
ability to optimize is dependent on constant propagation. Putting the VarHandle to static final
field helps that a lot, with the same mechanism as putting OFFSET for Unsafe accesses helps
performance.

It your case above, making VarHandle a method parameter is performance-risky move, but it is
mitigated by the use-site that loads it from the static final field anyway. Thus, if method is
inlined, you get the same benefits. The concern for "Object" and "this" is not valid there, I think,
because inlining propagates type information too.

Thanks,
-Aleksey



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Aleksey Shipilev-3
On 08/09/2017 10:25 AM, Aleksey Shipilev wrote:

> On 08/09/2017 10:17 AM, Dávid Karnok wrote:
>> In my codebase, targeting Java 9, I often have to perform the same set of atomic operations on
>> fields of various classes, for example, a deferred cancellation of Flow.Subscriptions:
>>
>> Flow.Subscription upstream;
>> static final VarHandle UPSTREAM;
>>
>> @Override
>> public void cancel() {
>>     Flow.Subscription a = (Flow.Subscription)UPSTREAM.getAcquire(this);
>>     if (a != CancelledSubscription.INSTANCE) {
>>         a = (Flow.Subscription)UPSTREAM.getAndSet(this, CancelledSubscription.INSTANCE);
>>         if (a != null && a != CancelledSubscription.INSTANCE) {
>>             a.cancel();
>>         }
>>     }
>> }
>>
>> Refactored into:
>>
>> final class SubscriptionHelper {
>>
>>     public static void cancel(Object target, VarHandle handle) {
>>         Flow.Subscription a = (Flow.Subscription)handle.getAcquire(target);
>>         if (a != CancelledSubscription.INSTANCE) {
>>             a = (Flow.Subscription)handle.getAndSet(target, CancelledSubscription.INSTANCE);
>>             if (a != null && a != CancelledSubscription.INSTANCE) {
>>                 a.cancel();
>>             }
>>         }
>>     }
>> }
>>
>> @Override
>> public void cancel() {
>>     SubscriptionHelper.cancel(this, UPSTREAM);
>> }
>>
>>
>> I'd think JIT can and will inline SubscriptionHelper.cancel to all its use sites, but the fact that
>> the cancel method no longer has "this" but an arbitrary target Object, my concern is that the
>> optimizations may not happen.
>>
>> I haven't noticed any performance penalties so far but I remember Aleksey Shipilev mentioning
>> somewhere, some time ago, a warning about such out-of-context VarHandle uses.
>
> Like with Unsafe, like with Atomic*FieldUpdaters, like with *Handles in general, the compiler's
> ability to optimize is dependent on constant propagation. Putting the VarHandle to static final
> field helps that a lot, with the same mechanism as putting OFFSET for Unsafe accesses helps
> performance.
>
> It your case above, making VarHandle a method parameter is performance-risky move, but it is
> mitigated by the use-site that loads it from the static final field anyway. Thus, if method is
> inlined, you get the same benefits. The concern for "Object" and "this" is not valid there, I think,
> because inlining propagates type information too.
I should have mentioned that at least in Hotspot, there is a real problem with type *profile*
pollution, because the type profile is context-agnostic, and bound to the concrete bytecode index.
So if SubscriptionHelper.cancel gets called with different "targets", *and* optimization depends on
profile, the inlining would not help to untangle that knot. Pretty sure the static type propagation
works fine there, but do test.

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Dávid Karnok
Thanks Aleksey!

I did a benchmark and VarHandles seem to work fine compared to AtomicReferences (assuming that I measured the right setup):

Benchmark                     Mode  Cnt          Score         Error  Units
VarHandleCostPerf.baseline1  thrpt    5  159276715,278 ± 3571493,763  ops/s
VarHandleCostPerf.bench1     thrpt    5  162339382,232 ±  763533,693  ops/s
VarHandleCostPerf.baseline2  thrpt    5   72675096,238 ± 1262485,061  ops/s
VarHandleCostPerf.bench2     thrpt    5   84963708,660 ±  596244,817  ops/s
VarHandleCostPerf.baseline3  thrpt    5   38747819,413 ± 1513177,407  ops/s
VarHandleCostPerf.bench3     thrpt    5   47328852,938 ±  157493,140  ops/s
VarHandleCostPerf.baseline4  thrpt    5   38047055,938 ±  316562,325  ops/s
VarHandleCostPerf.bench4     thrpt    5   38053864,102 ±  180163,924  ops/s
VarHandleCostPerf.baseline5  thrpt    5   30075092,319 ±  151191,006  ops/s
VarHandleCostPerf.bench5     thrpt    5   29819608,499 ± 1088452,474  ops/s
VarHandleCostPerf.baseline6  thrpt    5   24924283,770 ±  214311,889  ops/s
VarHandleCostPerf.bench6     thrpt    5   24872577,980 ±  390354,651  ops/s
VarHandleCostPerf.baseline7  thrpt    5   21210169,977 ±  282669,696  ops/s
VarHandleCostPerf.bench7     thrpt    5   21083601,549 ±  424591,111  ops/s

Code:

Run:
i7 4790, Windows 7 x64, Java 9b181, JMH 1.19

2017-08-09 10:30 GMT+02:00 Aleksey Shipilev <[hidden email]>:
On 08/09/2017 10:25 AM, Aleksey Shipilev wrote:
> On 08/09/2017 10:17 AM, Dávid Karnok wrote:
>> In my codebase, targeting Java 9, I often have to perform the same set of atomic operations on
>> fields of various classes, for example, a deferred cancellation of Flow.Subscriptions:
>>
>> Flow.Subscription upstream;
>> static final VarHandle UPSTREAM;
>>
>> @Override
>> public void cancel() {
>>     Flow.Subscription a = (Flow.Subscription)UPSTREAM.getAcquire(this);
>>     if (a != CancelledSubscription.INSTANCE) {
>>         a = (Flow.Subscription)UPSTREAM.getAndSet(this, CancelledSubscription.INSTANCE);
>>         if (a != null && a != CancelledSubscription.INSTANCE) {
>>             a.cancel();
>>         }
>>     }
>> }
>>
>> Refactored into:
>>
>> final class SubscriptionHelper {
>>
>>     public static void cancel(Object target, VarHandle handle) {
>>         Flow.Subscription a = (Flow.Subscription)handle.getAcquire(target);
>>         if (a != CancelledSubscription.INSTANCE) {
>>             a = (Flow.Subscription)handle.getAndSet(target, CancelledSubscription.INSTANCE);
>>             if (a != null && a != CancelledSubscription.INSTANCE) {
>>                 a.cancel();
>>             }
>>         }
>>     }
>> }
>>
>> @Override
>> public void cancel() {
>>     SubscriptionHelper.cancel(this, UPSTREAM);
>> }
>>
>>
>> I'd think JIT can and will inline SubscriptionHelper.cancel to all its use sites, but the fact that
>> the cancel method no longer has "this" but an arbitrary target Object, my concern is that the
>> optimizations may not happen.
>>
>> I haven't noticed any performance penalties so far but I remember Aleksey Shipilev mentioning
>> somewhere, some time ago, a warning about such out-of-context VarHandle uses.
>
> Like with Unsafe, like with Atomic*FieldUpdaters, like with *Handles in general, the compiler's
> ability to optimize is dependent on constant propagation. Putting the VarHandle to static final
> field helps that a lot, with the same mechanism as putting OFFSET for Unsafe accesses helps
> performance.
>
> It your case above, making VarHandle a method parameter is performance-risky move, but it is
> mitigated by the use-site that loads it from the static final field anyway. Thus, if method is
> inlined, you get the same benefits. The concern for "Object" and "this" is not valid there, I think,
> because inlining propagates type information too.

I should have mentioned that at least in Hotspot, there is a real problem with type *profile*
pollution, because the type profile is context-agnostic, and bound to the concrete bytecode index.
So if SubscriptionHelper.cancel gets called with different "targets", *and* optimization depends on
profile, the inlining would not help to untangle that knot. Pretty sure the static type propagation
works fine there, but do test.

Thanks,
-Aleksey




--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Aleksey Shipilev-3
On 08/09/2017 01:20 PM, Dávid Karnok wrote:
> Thanks Aleksey!
>
> I did a benchmark and VarHandles seem to work fine compared to AtomicReferences (assuming that I
> measured the right setup):
>
> Benchmark                     Mode  Cnt          Score         Error  Units
> VarHandleCostPerf.baseline1  thrpt    5  159276715,278 ± 3571493,763  ops/s
> VarHandleCostPerf.bench1     thrpt    5  162339382,232 ±  763533,693  ops/s

> VarHandleCostPerf.baseline2  thrpt    5   72675096,238 ± 1262485,061  ops/s
> VarHandleCostPerf.bench2     thrpt    5   84963708,660 ±  596244,817  ops/s

> VarHandleCostPerf.baseline3  thrpt    5   38747819,413 ± 1513177,407  ops/s
> VarHandleCostPerf.bench3     thrpt    5   47328852,938 ±  157493,140  ops/s

> VarHandleCostPerf.baseline4  thrpt    5   38047055,938 ±  316562,325  ops/s
> VarHandleCostPerf.bench4     thrpt    5   38053864,102 ±  180163,924  ops/s

> VarHandleCostPerf.baseline5  thrpt    5   30075092,319 ±  151191,006  ops/s
> VarHandleCostPerf.bench5     thrpt    5   29819608,499 ± 1088452,474  ops/s

> VarHandleCostPerf.baseline6  thrpt    5   24924283,770 ±  214311,889  ops/s
> VarHandleCostPerf.bench6     thrpt    5   24872577,980 ±  390354,651  ops/s

> VarHandleCostPerf.baseline7  thrpt    5   21210169,977 ±  282669,696  ops/s
> VarHandleCostPerf.bench7     thrpt    5   21083601,549 ±  424591,111  ops/s

Pro-tip: measuring this in AverageTime with ns/op is much more readable.

It seems *2 and *3 wins for VarHandles, I wonder why is that. I guess that is because the
AtomicReference instances you have in the tests are actually different classes ("{  }" yields the
anonymous subclass), which explains this somewhat.

AtomicReferences also keep the value one dereference away, but your test would not show that, given
very low cache footprint. AtomicReferenceFieldUpdater seems to be the better baseline, or whatever
you use right now in Rx*?

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Dávid Karnok
Indeed, the { } are deliberate to simulate different subclasses that build upon AtomicReference. RxJava v1..3 use Atomic* classes (limit by Java 6 & Android) and RxJava 4 (I'm experimenting with) will use VarHandles.

2017-08-09 13:34 GMT+02:00 Aleksey Shipilev <[hidden email]>:
On 08/09/2017 01:20 PM, Dávid Karnok wrote:
> Thanks Aleksey!
>
> I did a benchmark and VarHandles seem to work fine compared to AtomicReferences (assuming that I
> measured the right setup):
>
> Benchmark                     Mode  Cnt          Score         Error  Units
> VarHandleCostPerf.baseline1  thrpt    <a href="tel:5%20%20159276715" value="+15159276715">5 159276715,278 ± 3571493,763  ops/s
> VarHandleCostPerf.bench1     thrpt    <a href="tel:5%20%20162339382" value="+15162339382">5 162339382,232 ±  763533,693  ops/s

> VarHandleCostPerf.baseline2  thrpt    5   72675096,238 ± 1262485,061  ops/s
> VarHandleCostPerf.bench2     thrpt    5   84963708,660 ±  596244,817  ops/s

> VarHandleCostPerf.baseline3  thrpt    5   38747819,413 ± 1513177,407  ops/s
> VarHandleCostPerf.bench3     thrpt    5   47328852,938 ±  157493,140  ops/s

> VarHandleCostPerf.baseline4  thrpt    5   38047055,938 ±  316562,325  ops/s
> VarHandleCostPerf.bench4     thrpt    5   38053864,102 ±  180163,924  ops/s

> VarHandleCostPerf.baseline5  thrpt    5   30075092,319 ±  151191,006  ops/s
> VarHandleCostPerf.bench5     thrpt    5   29819608,499 ± 1088452,474  ops/s

> VarHandleCostPerf.baseline6  thrpt    5   24924283,770 ±  214311,889  ops/s
> VarHandleCostPerf.bench6     thrpt    5   24872577,980 ±  390354,651  ops/s

> VarHandleCostPerf.baseline7  thrpt    5   21210169,977 ±  282669,696  ops/s
> VarHandleCostPerf.bench7     thrpt    5   21083601,549 ±  424591,111  ops/s

Pro-tip: measuring this in AverageTime with ns/op is much more readable.

It seems *2 and *3 wins for VarHandles, I wonder why is that. I guess that is because the
AtomicReference instances you have in the tests are actually different classes ("{  }" yields the
anonymous subclass), which explains this somewhat.

AtomicReferences also keep the value one dereference away, but your test would not show that, given
very low cache footprint. AtomicReferenceFieldUpdater seems to be the better baseline, or whatever
you use right now in Rx*?

Thanks,
-Aleksey




--
Best regards,
David Karnok

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Does factoring out VarHandle-based manipulations cause performance penalties?

Paul Sandoz
In reply to this post by Aleksey Shipilev-3

> On 9 Aug 2017, at 01:30, Aleksey Shipilev <[hidden email]> wrote:
>>
>> Like with Unsafe, like with Atomic*FieldUpdaters, like with *Handles in general, the compiler's
>> ability to optimize is dependent on constant propagation. Putting the VarHandle to static final
>> field helps that a lot, with the same mechanism as putting OFFSET for Unsafe accesses helps
>> performance.
>>
>> It your case above, making VarHandle a method parameter is performance-risky move, but it is
>> mitigated by the use-site that loads it from the static final field anyway. Thus, if method is
>> inlined, you get the same benefits. The concern for "Object" and "this" is not valid there, I think,
>> because inlining propagates type information too.
>
> I should have mentioned that at least in Hotspot, there is a real problem with type *profile*
> pollution, because the type profile is context-agnostic, and bound to the concrete bytecode index.
> So if SubscriptionHelper.cancel gets called with different "targets", *and* optimization depends on
> profile, the inlining would not help to untangle that knot. Pretty sure the static type propagation
> works fine there, but do test.
>
I am a little fuzzy on the details of type profile pollution but… in this use-case the target Object and the VarHandle are intimately related, the target will be upcast to Object by the common method then downcast by the VH, so as long as the JIT can track that, it should fold away the downcast when inlining.

Great to see VarHandles working out here. This adds weight to the decision to use MethodHandle.invoke semantics rather than the more restrictive MethodHandle.invokeExact semantics, the latter of which makes such reuse harder.

Paul.

> Thanks,
> -Aleksey
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (858 bytes) Download Attachment