Using relaxed reads and writes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Using relaxed reads and writes

Ben Manes
I think the following is a safe optimization for my use-case, but I'd appreciate any feedback / concerns.

Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is an entry in an LRU cache with miscellaneous metadata fields such as the queue links, the value, expiration timestamps, etc. When updating the value the node must be synchronized on in order to block if created by a concurrent computation. This means the more optimal CAS loop isn't used, which may have affect on simplifying the following discussion. Lets say that the Node's value and accessTime fields are volatile.

When writing the value the node's lock must be held (or implicitly safe due to Node construction). In that case I think that a relaxed write (Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for visibility (or publishing via insert if created). When the entry is retrieved on a read the accessTime needs to be checked for expiration and updated to the current nano time, and the value returned. As these follow a ConcurrentHashMap lookup, I think that the timestamp and value reads can be relaxed (Unsafe#getLong and Unsafe#getObject) by piggybacking on the Map.get()'s memory barrier. It also appears that updating the accessTime can be relaxed (Unsafe#putOrderedObject) as eviction it is a best effort policy.

The race conditions of seeing stale data appears to me to be benign, already inherent in concurrent data structures, and caches allow a slightly more relaxed attitude toward synchronization. Do these assumption hold water?

Thanks,
Ben

P.S. Those interested in explicit code can see a prototype at: https://github.com/ben-manes/caffeine
 

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Using relaxed reads and writes

Doug Lea
On 01/22/2015 07:23 PM, Ben Manes wrote:

> I think the following is a safe optimization for my use-case, but I'd appreciate
> any feedback / concerns.
>
> Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is an entry
> in an LRU cache with miscellaneous metadata fields such as the queue links, the
> value, expiration timestamps, etc. When updating the value the node must be
> synchronized on in order to block if created by a concurrent computation. This
> means the more optimal CAS loop isn't used, which may have affect on simplifying
> the following discussion. Lets say that the Node's value and accessTime fields
> are volatile.
>
> When writing the value the node's lock must be held (or implicitly safe due to
> Node construction). In that case I think that a relaxed write
> (Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for
> visibility (or publishing via insert if created).

Yes, in fact hotspot often does this optimization for you,
at least on x86 and sparc, although not if biased locking
is enabled. It may not be worthwhile to hand-optimize.

-Doug



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Using relaxed reads and writes

Ben Manes
Thanks Doug!

That's really nice to know, since it bugged me for a long time but I wasn't sure if it was a safe enough optimization for a Hotspot transformation. Since there are some relaxed read/write opportunities outside of a lock I think its worth doing by hand. The relaxed operations are done within code generated classes so the extra effort to hand optimize turned out to be minor (72 node types to minimize per-node footprint based on the cache's configuration).


On Friday, January 23, 2015 5:50 PM, Doug Lea <[hidden email]> wrote:


On 01/22/2015 07:23 PM, Ben Manes wrote:

> I think the following is a safe optimization for my use-case, but I'd appreciate
> any feedback / concerns.
>
> Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is an entry
> in an LRU cache with miscellaneous metadata fields such as the queue links, the
> value, expiration timestamps, etc. When updating the value the node must be
> synchronized on in order to block if created by a concurrent computation. This
> means the more optimal CAS loop isn't used, which may have affect on simplifying
> the following discussion. Lets say that the Node's value and accessTime fields
> are volatile.
>
> When writing the value the node's lock must be held (or implicitly safe due to
> Node construction). In that case I think that a relaxed write
> (Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for
> visibility (or publishing via insert if created).

Yes, in fact hotspot often does this optimization for you,
at least on x86 and sparc, although not if biased locking
is enabled. It may not be worthwhile to hand-optimize.

-Doug



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Using relaxed reads and writes

Martin Buchholz-3
In reply to this post by Doug Lea
We've been doing these kinds of optimizations in j.u.c., especially the obvious one of using relaxed write before publishing via CAS, but we haven't observed compelling performance improvements.  Both hotspot and the hardware conspire to make two successive volatile writes to a single cache line much less than twice as expensive as one?!

On Fri, Jan 23, 2015 at 5:25 PM, Doug Lea <[hidden email]> wrote:
On 01/22/2015 07:23 PM, Ben Manes wrote:
I think the following is a safe optimization for my use-case, but I'd appreciate
any feedback / concerns.

Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is an entry
in an LRU cache with miscellaneous metadata fields such as the queue links, the
value, expiration timestamps, etc. When updating the value the node must be
synchronized on in order to block if created by a concurrent computation. This
means the more optimal CAS loop isn't used, which may have affect on simplifying
the following discussion. Lets say that the Node's value and accessTime fields
are volatile.

When writing the value the node's lock must be held (or implicitly safe due to
Node construction). In that case I think that a relaxed write
(Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for
visibility (or publishing via insert if created).

Yes, in fact hotspot often does this optimization for you,
at least on x86 and sparc, although not if biased locking
is enabled. It may not be worthwhile to hand-optimize.

-Doug



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Using relaxed reads and writes

Ben Manes
Yes, your work there is what made me consider adopting that optimization to a greater extent in this rewrite project. Similarly your thread on memory barriers got me to finally read up on the specifics to wrap my head around how to use those constructs as a model for reasoning about code. While I don't think I'll go down the route of using explicit Unsafe fence instructions anytime soon, its been valuable insight for using relaxed reads and writes more broadly.

There are a lot of tricks in j.u.c. that I happily steal, like JDK7 Exchanger's spinning heuristics were helpful when writing a elimination/combining arenas (stacks and queues). Similarly, I have plans for adapting Stripe64's resizing when contention is detected technique in a few key areas, such as arenas and cache buffer management. When the project is near completion, I'll probably try to see if you and Doug would be gracious enough to review the algorithms at a high level to see if there are any further insights that I can take advantage of.

So thanks a lot for all the great ideas! =)


On Saturday, January 24, 2015 1:13 PM, Martin Buchholz <[hidden email]> wrote:


We've been doing these kinds of optimizations in j.u.c., especially the obvious one of using relaxed write before publishing via CAS, but we haven't observed compelling performance improvements.  Both hotspot and the hardware conspire to make two successive volatile writes to a single cache line much less than twice as expensive as one?!

On Fri, Jan 23, 2015 at 5:25 PM, Doug Lea <[hidden email]> wrote:
On 01/22/2015 07:23 PM, Ben Manes wrote:
I think the following is a safe optimization for my use-case, but I'd appreciate
any feedback / concerns.

Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is an entry
in an LRU cache with miscellaneous metadata fields such as the queue links, the
value, expiration timestamps, etc. When updating the value the node must be
synchronized on in order to block if created by a concurrent computation. This
means the more optimal CAS loop isn't used, which may have affect on simplifying
the following discussion. Lets say that the Node's value and accessTime fields
are volatile.

When writing the value the node's lock must be held (or implicitly safe due to
Node construction). In that case I think that a relaxed write
(Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for
visibility (or publishing via insert if created).

Yes, in fact hotspot often does this optimization for you,
at least on x86 and sparc, although not if biased locking
is enabled. It may not be worthwhile to hand-optimize.

-Doug



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Using relaxed reads and writes

Doug Lea
In reply to this post by Martin Buchholz-3
On 01/24/2015 03:48 PM, Martin Buchholz wrote:
> We've been doing these kinds of optimizations in j.u.c., especially the obvious
> one of using relaxed write before publishing via CAS, but we haven't observed
> compelling performance improvements.

Some of the mystique here is because I know most of the
cases hotspot can optimize (in part because I once helped write
some of the optimizations (for example Matcher::post_store_load_barrier).
On the other hand, these internals do change (the introduction of
biased locking removed some cases that were handled.) So some cases
are manually optimized anyway just out of caution, and so don't
much impact observed performance.

-Doug

> On Fri, Jan 23, 2015 at 5:25 PM, Doug Lea <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 01/22/2015 07:23 PM, Ben Manes wrote:
>
>         I think the following is a safe optimization for my use-case, but I'd
>         appreciate
>         any feedback / concerns.
>
>         Lets say that I have a ConcurrentHashMap<K, Node<V>>, where Node<V> is
>         an entry
>         in an LRU cache with miscellaneous metadata fields such as the queue
>         links, the
>         value, expiration timestamps, etc. When updating the value the node must be
>         synchronized on in order to block if created by a concurrent
>         computation. This
>         means the more optimal CAS loop isn't used, which may have affect on
>         simplifying
>         the following discussion. Lets say that the Node's value and accessTime
>         fields
>         are volatile.
>
>         When writing the value the node's lock must be held (or implicitly safe
>         due to
>         Node construction). In that case I think that a relaxed write
>         (Unsafe#putOrderedObject) is safe by piggybacking on the mutex unlock for
>         visibility (or publishing via insert if created).
>
>
>     Yes, in fact hotspot often does this optimization for you,
>     at least on x86 and sparc, although not if biased locking
>     is enabled. It may not be worthwhile to hand-optimize.
>
>     -Doug
>
>
>
>     _________________________________________________
>     Concurrency-interest mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://cs.oswego.edu/mailman/__listinfo/concurrency-interest
>     <http://cs.oswego.edu/mailman/listinfo/concurrency-interest>
>
>

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest