Quantcast

Lightweight mutual exclusion without locking

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Lightweight mutual exclusion without locking

David M. Lloyd-3
Sometimes there is a case where there is a resource which cannot safely
*or meaningfully* be accessed concurrently by multiple threads.  For
such resources, there is a clear expectation that only one thread would
ever access it, and in many cases this expectation is necessary in order
to gain maximum performance.  However, it is possible that this
constraint might be accidentally be violated, causing potentially weird
and/or serious problems.

In such cases, one can use synchronization or locking to ensure safe
access, however this has been observed to cause potentially major
performance degradation in real-world cases, and is theoretically much
stronger/heavier than necessary, because it is not a requirement to
allow safe concurrent access; rather, the best behavior would be to
reject any latecomers with an exception.

One good example of a resource which cannot usually be meaningfully
shared among concurrently running tasks is a stream-oriented pipe.  The
pipe itself may be 100% safe for concurrent access, but since reads and
writes of stream pipes are not generally guaranteed to be atomic, a
typical stream protocol cannot usually be guaranteed to have correct
output in the presence of concurrent writes because stream protocols are
not usually resilient against arbitrary interleaving of content.

In cases like this, a good programmer may wish to take measures to
ensure that accidental concurrent access is deterred with an exception,
so that problems can be detected more easily.  However, despite that it
seems to me like it should be possible to do, I have not yet been able
to derive a provable scheme that would allow for this.

The requirements are:
* Access to the object from one thread should be maximally performant,
i.e. the impact of adding the extra check should be very small
* Access to the object from a thread when another thread is accessing it
should be detectable as reliably as possible
* Access to the object from two threads at different times, where the
object's state was not safely published from one thread to another by
some other mechanism (like a concurrent queue), can be disallowed, but
does not have to be

The simplest non-lock approach is to use an atomic boolean of some sort,
and use a get/CAS loop to determine if the object has been entered,
throwing an exception if it fails.  However the performance
characteristics of this approach seem to be very poor; nearly as bad as
locks when I last tested it.

What I imagine happening in a better solution is something like this,
given mutex field "mx": Atomically determine if "mx" is either "true" or
has a pending unflushed write; if so, fail with exception; if not, set
to "true" and indicate that the field has been written.

This requires the existence of the idea that one has the ability to
write to a field and globally signal that the field and/or cache line is
now "dirty", while also atomically failing if the field and/or cache
line is already considered "dirty" by some other processor (without
necessarily needing to know what the "dirty" value is or incur a cost
therefor).  Thus this solution would rely on such objects being safely
published before being usable by a different thread: but this is an OK
expectation, since this is already required for non-thread-safe objects
anyway, and this technique would merely be a safeguard of sorts.

Note that this hypothetical approach would completely crap out in the
presence of false sharing, so the object with this kind of guard in it
would have to be aligned to, and fill, a cache line, and you'd probably
want the mutex field to be early or first in case the object dribbles
out to a fraction of another line.

So, getting to my question (finally!): does VarHandle give us a CAS
variation which will *only* fail in exactly the circumstances I've
listed above (i.e. if the field is definitely not equal to the expected
value, *or* the field is subject to a pending unpublished write from
another CPU, while at the same time not causing other CPUs to reload the
value if the write succeeds)?  Is there another approach that could work
here?

--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

Aleksey Shipilev-3
Hi,

On 11/16/2016 03:33 PM, David M. Lloyd wrote:

> The requirements are:
> * Access to the object from one thread should be maximally performant,
> i.e. the impact of adding the extra check should be very small
> * Access to the object from a thread when another thread is accessing it
> should be detectable as reliably as possible
> * Access to the object from two threads at different times, where the
> object's state was not safely published from one thread to another by
> some other mechanism (like a concurrent queue), can be disallowed, but
> does not have to be
>
> The simplest non-lock approach is to use an atomic boolean of some sort,
> and use a get/CAS loop to determine if the object has been entered,
> throwing an exception if it fails.  However the performance
> characteristics of this approach seem to be very poor; nearly as bad as
> locks when I last tested it.
This is surprising, and probably warrants looking into. Care to share
the experimental results?


> What I imagine happening in a better solution is something like this,
> given mutex field "mx": Atomically determine if "mx" is either "true" or
> has a pending unflushed write; if so, fail with exception; if not, set
> to "true" and indicate that the field has been written.

Um, I am confused. I think this is a wishful thinking to be able to
detect a (remote) "pending unflushed write". That's the job for hardware
cache coherency protocols, and you would not be able to tap into that.
There are no ISA instructions that I know of that can poll the status
for a cache line (not to say, store buffers / invalidation queues, or
other machine-implementation specific things).

But even if we had the visibility into that, it would still be not
enough. For example, in MESI...

> This requires the existence of the idea that one has the ability to
> write to a field and globally signal that the field and/or cache line is
> now "dirty",

...there might be no communication that the cache line is dirty, e.g.
when transiting from E(xclusive) to M(odified) state.

> while also atomically failing if the field and/or cache
> line is already considered "dirty" by some other processor (without
> necessarily needing to know what the "dirty" value is or incur a cost
> therefor).

...there is no way to ask other processors if they have a given cache
line in M(odified) state, except for doing the actual
Request-For-Ownership (RFO) request for its S(hared) cache line. Which
is the hard part of the actual write/CAS as well.


> So, getting to my question (finally!): does VarHandle give us a CAS
> variation which will *only* fail in exactly the circumstances I've
> listed above (i.e. if the field is definitely not equal to the expected
> value, *or* the field is subject to a pending unpublished write from
> another CPU, while at the same time not causing other CPUs to reload the
> value if the write succeeds)?

This "pending unpublished write" business is from the layer below,
invisible to the software. But VarHandle CASes are not really different
from the CASes that Atomics have: strong CAS only fails when the actual
!= expected, and weak CAS is allowed to spuriously fail (at hardware's
discretion).

VarHandles is not a magic implementation that can peek into cache
coherency. It can help you to solve the indirection costs associated
with standalone Atomics (pretty much like Atomic*FieldUpdaters would),
provide some interesting access modes (which are irrelevant here,
because you are fighting the single-location cache coherency), but it
cannot hack into the CPU.

> Is there another approach that could work here?

No, I don't think so. You cannot to something better than hardware
allows for.

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

David M. Lloyd-3
On 11/16/2016 10:29 AM, Aleksey Shipilev wrote:

> Hi,
>
> On 11/16/2016 03:33 PM, David M. Lloyd wrote:
>> The requirements are:
>> * Access to the object from one thread should be maximally performant,
>> i.e. the impact of adding the extra check should be very small
>> * Access to the object from a thread when another thread is accessing it
>> should be detectable as reliably as possible
>> * Access to the object from two threads at different times, where the
>> object's state was not safely published from one thread to another by
>> some other mechanism (like a concurrent queue), can be disallowed, but
>> does not have to be
>>
>> The simplest non-lock approach is to use an atomic boolean of some sort,
>> and use a get/CAS loop to determine if the object has been entered,
>> throwing an exception if it fails.  However the performance
>> characteristics of this approach seem to be very poor; nearly as bad as
>> locks when I last tested it.
>
> This is surprising, and probably warrants looking into. Care to share
> the experimental results?

Sure, next time I swing around to the project in question I'll follow up
with some test results.

>> What I imagine happening in a better solution is something like this,
>> given mutex field "mx": Atomically determine if "mx" is either "true" or
>> has a pending unflushed write; if so, fail with exception; if not, set
>> to "true" and indicate that the field has been written.
>
> Um, I am confused. I think this is a wishful thinking to be able to
> detect a (remote) "pending unflushed write". That's the job for hardware
> cache coherency protocols, and you would not be able to tap into that.
> There are no ISA instructions that I know of that can poll the status
> for a cache line (not to say, store buffers / invalidation queues, or
> other machine-implementation specific things).
>
> But even if we had the visibility into that, it would still be not
> enough. For example, in MESI...
>
>> This requires the existence of the idea that one has the ability to
>> write to a field and globally signal that the field and/or cache line is
>> now "dirty",
>
> ...there might be no communication that the cache line is dirty, e.g.
> when transiting from E(xclusive) to M(odified) state.
>
>> while also atomically failing if the field and/or cache
>> line is already considered "dirty" by some other processor (without
>> necessarily needing to know what the "dirty" value is or incur a cost
>> therefor).
>
> ...there is no way to ask other processors if they have a given cache
> line in M(odified) state, except for doing the actual
> Request-For-Ownership (RFO) request for its S(hared) cache line. Which
> is the hard part of the actual write/CAS as well.
>
>> So, getting to my question (finally!): does VarHandle give us a CAS
>> variation which will *only* fail in exactly the circumstances I've
>> listed above (i.e. if the field is definitely not equal to the expected
>> value, *or* the field is subject to a pending unpublished write from
>> another CPU, while at the same time not causing other CPUs to reload the
>> value if the write succeeds)?
>
> This "pending unpublished write" business is from the layer below,
> invisible to the software. But VarHandle CASes are not really different
> from the CASes that Atomics have: strong CAS only fails when the actual
> != expected, and weak CAS is allowed to spuriously fail (at hardware's
> discretion).

The CASes that Atomics have are limited to "compareAndSet" and
"weakCompareAndSet", whereas VarHandle (according to the current
JavaDoc) has four different modes each of normal and weak CAS.  So
already there is some difference there, and it's still not super-clear
to me what the tradeoffs are.  More on this below...

> VarHandles is not a magic implementation that can peek into cache
> coherency. It can help you to solve the indirection costs associated
> with standalone Atomics (pretty much like Atomic*FieldUpdaters would),
> provide some interesting access modes (which are irrelevant here,
> because you are fighting the single-location cache coherency), but it
> cannot hack into the CPU.
>
>> Is there another approach that could work here?
>
> No, I don't think so.

That's a shame.  Thanks for the detailed answer.

In lieu of using the normal "strong" CAS flavor I described above, does
it make more sense in this use case to use an "acquire" CAS on entry and
a "release" set on exit?  Is there likely to be any performance
difference in the common architectures?

--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

Peter Levart
In reply to this post by David M. Lloyd-3
Hi David,

On 11/16/2016 03:33 PM, David M. Lloyd wrote:
> The requirements are:
> * Access to the object from one thread should be maximally performant,
> i.e. the impact of adding the extra check should be very small
> * Access to the object from a thread when another thread is accessing
> it should be detectable as reliably as possible
> * Access to the object from two threads at different times, where the
> object's state was not safely published from one thread to another by
> some other mechanism (like a concurrent queue), can be disallowed, but
> does not have to be

You don't say if your requirements include granting access to the object
from different threads in any circumstance and in which circumstance
access is to be granted.

If you only need allowing access from a single thread for the lifetime
of the resource, then you can simply "pin" the resource to the thread
that "allocates" it. For example:

public class Resource {
     private final Thread constructingThread;

     public Resource(Thread constructingThread) {
         this.constructingThread = Thread.currentThread();
     }

     public void access() {
         if (constructingThread != Thread.currentThread()) {
             throw new IllegalStateException("Access by non-constructing
thread rejected");
         }
         // ... access the resource ...
     }
}



Regards, Peter

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

David M. Lloyd-3
On 11/16/2016 11:12 AM, Peter Levart wrote:

> Hi David,
>
> On 11/16/2016 03:33 PM, David M. Lloyd wrote:
>> The requirements are:
>> * Access to the object from one thread should be maximally performant,
>> i.e. the impact of adding the extra check should be very small
>> * Access to the object from a thread when another thread is accessing
>> it should be detectable as reliably as possible
>> * Access to the object from two threads at different times, where the
>> object's state was not safely published from one thread to another by
>> some other mechanism (like a concurrent queue), can be disallowed, but
>> does not have to be
>
> You don't say if your requirements include granting access to the object
> from different threads in any circumstance and in which circumstance
> access is to be granted.
>
> If you only need allowing access from a single thread for the lifetime
> of the resource, then you can simply "pin" the resource to the thread
> that "allocates" it. For example:

I guess I didn't explain well; I meant "one thread at a time, as
mediated by safe publication".  In other words, from a high level, I'd
like to be able to treat the object exactly the same as any other
thread-unsafe object, with the exception that there is some kind of
lightweight check to ensure that the rules aren't broken.

--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

Andrew Haley
In reply to this post by David M. Lloyd-3
On 16/11/16 16:42, David M. Lloyd wrote:

>> This "pending unpublished write" business is from the layer below,
>> > invisible to the software. But VarHandle CASes are not really different
>> > from the CASes that Atomics have: strong CAS only fails when the actual
>> > != expected, and weak CAS is allowed to spuriously fail (at hardware's
>> > discretion).
> The CASes that Atomics have are limited to "compareAndSet" and
> "weakCompareAndSet", whereas VarHandle (according to the current
> JavaDoc) has four different modes each of normal and weak CAS.  So
> already there is some difference there, and it's still not super-clear
> to me what the tradeoffs are.  More on this below...

This is all to do with ordering.  On x86 it won't make any difference
at all, and on other architectures it might make a small difference.

>>> >> Is there another approach that could work here?
>> >
>> > No, I don't think so.
> That's a shame.  Thanks for the detailed answer.
>
> In lieu of using the normal "strong" CAS flavor I described above, does
> it make more sense in this use case to use an "acquire" CAS on entry and
> a "release" set on exit?  Is there likely to be any performance
> difference in the common architectures?

That's unlikely.  But I think we are all very surprised that you are
seeing poor performance.  Unless the lock/CAS/whatever is highly
contended the overhead should be very minor.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Lightweight mutual exclusion without locking

David M. Lloyd-3
On 11/16/2016 11:47 AM, Andrew Haley wrote:

> On 16/11/16 16:42, David M. Lloyd wrote:
>>> This "pending unpublished write" business is from the layer below,
>>>> invisible to the software. But VarHandle CASes are not really different
>>>> from the CASes that Atomics have: strong CAS only fails when the actual
>>>> != expected, and weak CAS is allowed to spuriously fail (at hardware's
>>>> discretion).
>> The CASes that Atomics have are limited to "compareAndSet" and
>> "weakCompareAndSet", whereas VarHandle (according to the current
>> JavaDoc) has four different modes each of normal and weak CAS.  So
>> already there is some difference there, and it's still not super-clear
>> to me what the tradeoffs are.  More on this below...
>
> This is all to do with ordering.  On x86 it won't make any difference
> at all, and on other architectures it might make a small difference.
>
>>>>>> Is there another approach that could work here?
>>>>
>>>> No, I don't think so.
>> That's a shame.  Thanks for the detailed answer.
>>
>> In lieu of using the normal "strong" CAS flavor I described above, does
>> it make more sense in this use case to use an "acquire" CAS on entry and
>> a "release" set on exit?  Is there likely to be any performance
>> difference in the common architectures?
>
> That's unlikely.  But I think we are all very surprised that you are
> seeing poor performance.  Unless the lock/CAS/whatever is highly
> contended the overhead should be very minor.

OK, next time I come 'round to it I'll try and come up with some kind of
cogent benchmark.  It's been long enough that I can't promise there
isn't some ancillary problem like false sharing going on.

--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Loading...