Copy-On-Write with wait-free writes

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Copy-On-Write with wait-free writes

Pedro Ramalhete
Hello,

As mentioned a few times before on this mailing list, there are two variants
of the Copy-On-Write concurrent technique. One of them uses a lock and is
blocking for mutations, and it is the one currently used in CopyOnWriteArrayList.
The other one uses a compareAndSet loop and it is lock-free for mutations but
extremely unfair.
Both are wait-free population oblivious for read-only operations, and
provide linearizability for all concurrent operations, whether mutative
or read-only, to be applied on the encapsulated object or data structure.

We would like to present (what we believe to be) a new technique that is
wait-free for mutations and read-only operations, linearizable, which we
named COWMutationQ.
The main idea is that all mutative operations are placed in a queue as lambdas,
along with the lambdas parameters, and then create one and only one copy
of the encapsulated object or data structure per thread, and apply all mutations
in sequence.
This should _not_ be mistaken for the well known asynchronous technique used
in Actor Models, where mutations are placed in a queue and then a Future/Promise
is kept to get the "result" of the mutation, which is a blocking technique.


Although the implementation is easy and the usage even easier, there are a few
tricks and details behind it, which can be seen on the paper describing COWMutationQ:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/cowmq-2015.pdf

For those of you interested in code, we've made two variants of the algorithm,
one with a wait-free queue, using Alex Kogan and Erez Petrank's queue:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/COWMutationQWF.java
and another with a lock-free queue, using Maged Michael and Michael Scott's
queue, the same algorithm used in ConcurrentLinkedQueue (very short code):
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/COWMutationQLF.java

To see how this is used, you can take a look at one of the examples, such as the
one where we transform a TreeMap into a completely wait-free data structure:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/CopyOnWriteMQWFTreeMap.java
or a completely wait-free ArrayList:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/CopyOnWriteMQWFArrayList.java

Like the other COW techniques, COWMutationQ isn't scalable for mutations but
it does have a better throughput and latency than the other two variants
in many scenarios.
If you have a mostly read-only workload, and you want to make any object or
data structure completely wait-free, then this may be of interest to you.


Enjoy,
Pedro & Andreia
http://www.concurrencyfreaks.com

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Aaron Grunthal
I think contention in the latency benchmarks in chapter 3.1 be reduced
by using weak variants of add()/remove().

E.g. by only attempting the mutation once and if the CAS fails and there
is a next-node in the mutations-queue then the it would know that
another thread is also trying to do the same work and it can return
immediately knowing that the mutation will be performed eventually.

Since the benchmark seems to be producer-consumer in nature the result
of the add()/remove() operations do not necessarily need to be
immediately visible to the enqueuing thread as long as they remain
ordered with respect to each other. I.e. the methods can return early
(and thus reduce allocations and cache contention) as long as they made
sure that there's another mutator thread that would do the same (or
more) work anyway.

- Aaron

On 17.05.2015 18:28, Pedro Ramalhete wrote:

> Hello,
>
> As mentioned a few times before on this mailing list, there are two variants
> of the Copy-On-Write concurrent technique. One of them uses a lock and is
> blocking for mutations, and it is the one currently used in
> CopyOnWriteArrayList.
> The other one uses a compareAndSet loop and it is lock-free for
> mutations but
> extremely unfair.
> Both are wait-free population oblivious for read-only operations, and
> provide linearizability for all concurrent operations, whether mutative
> or read-only, to be applied on the encapsulated object or data structure.
>
> We would like to present (what we believe to be) a new technique that is
> wait-free for mutations and read-only operations, linearizable, which we
> named COWMutationQ.
> The main idea is that all mutative operations are placed in a queue as
> lambdas,
> along with the lambdas parameters, and then create one and only one copy
> of the encapsulated object or data structure per thread, and apply all
> mutations
> in sequence.
> This should _not_ be mistaken for the well known asynchronous technique
> used
> in Actor Models, where mutations are placed in a queue and then a
> Future/Promise
> is kept to get the "result" of the mutation, which is a blocking technique.
>
>
> Although the implementation is easy and the usage even easier, there are
> a few
> tricks and details behind it, which can be seen on the paper describing
> COWMutationQ:
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/cowmq-2015.pdf
>
> For those of you interested in code, we've made two variants of the
> algorithm,
> one with a wait-free queue, using Alex Kogan and Erez Petrank's queue:
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/COWMutationQWF.java
> and another with a lock-free queue, using Maged Michael and Michael Scott's
> queue, the same algorithm used in ConcurrentLinkedQueue (very short code):
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/COWMutationQLF.java
>
> To see how this is used, you can take a look at one of the examples,
> such as the
> one where we transform a TreeMap into a completely wait-free data structure:
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/CopyOnWriteMQWFTreeMap.java
> or a completely wait-free ArrayList:
> https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/cow/CopyOnWriteMQWFArrayList.java
>
> Like the other COW techniques, COWMutationQ isn't scalable for mutations but
> it does have a better throughput and latency than the other two variants
> in many scenarios.
> If you have a mostly read-only workload, and you want to make any object or
> data structure completely wait-free, then this may be of interest to you.
>
>
> Enjoy,
> Pedro & Andreia
> http://www.concurrencyfreaks.com
>
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Martin Buchholz-3


On Sun, May 17, 2015 at 1:13 PM, Aaron Grunthal <[hidden email]> wrote:
I think contention in the latency benchmarks in chapter 3.1 be reduced by using weak variants of add()/remove().

E.g. by only attempting the mutation once and if the CAS fails and there is a next-node in the mutations-queue then the it would know that another thread is also trying to do the same work and it can return immediately knowing that the mutation will be performed eventually.


But ... this thread has already done (some of) the work and it would be a waste not to share it with other threads, assuming the cost of CAS is small compared to the cost of a Mutation.
 

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Aaron Grunthal
Right, to avoid most of the work, especially the allocation a lazy
mutation would have to be able to bail out right after enqueuing the
work while knowing that some other thread pick up for it.

It seems pretty difficult to offload without violating the wait-free
properties of the loops after the enqueue.


How about this approach:

* Each node in the queue is colored either black or white.
* Non-lazy ops always submit white nodes.
* Lazy ops will try to submit a black node if the tail is a white node.
  * If submitting a black node is successful they can bail out.
  * If not they have to submit a white node
* Threads that submitted a white node will do their work as normal and
execute the non-white mutating operations immediately beyond their own node

Some additional checks after the CAS loops to ensure that there really
is someone to pick up the work after a non-white node has been submitted
will be necessary, but those should be fairly simple.

White nodes essentially are a promise to do a bounded amount of
additional work.


This way lazy ops can parasitize other threads, depending on the mixture
of lazy and non-lazy ops anywhere between 50% and 100% of the time.

Additional colors could narrow that range closer to 100%.

In the contended case this should cut down on CAS operations,
allocations and burned CPU cycles by reducing the # of threads doing
redundant work.


That's for the lock-free variant at least, I haven't checked whether the
wait-free one can be extended this way.



- Aaron


On 18.05.2015 21:22, Martin Buchholz wrote:

>
>
> On Sun, May 17, 2015 at 1:13 PM, Aaron Grunthal
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I think contention in the latency benchmarks in chapter 3.1 be
>     reduced by using weak variants of add()/remove().
>
>     E.g. by only attempting the mutation once and if the CAS fails and
>     there is a next-node in the mutations-queue then the it would know
>     that another thread is also trying to do the same work and it can
>     return immediately knowing that the mutation will be performed
>     eventually.
>
>
> But ... this thread has already done (some of) the work and it would be
> a waste not to share it with other threads, assuming the cost of CAS is
> small compared to the cost of a Mutation.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Ben Manes
I suspect that a flat combining version would provide most of the benefits in a much simpler form, with the benefit of reducing copy churn. I don't particularly like the lack of GC hygiene on the removals by keeping an element reference until another write makes it collectable.



On Monday, May 18, 2015 7:07 PM, Aaron Grunthal <[hidden email]> wrote:


Right, to avoid most of the work, especially the allocation a lazy
mutation would have to be able to bail out right after enqueuing the
work while knowing that some other thread pick up for it.

It seems pretty difficult to offload without violating the wait-free
properties of the loops after the enqueue.


How about this approach:

* Each node in the queue is colored either black or white.
* Non-lazy ops always submit white nodes.
* Lazy ops will try to submit a black node if the tail is a white node.
  * If submitting a black node is successful they can bail out.
  * If not they have to submit a white node
* Threads that submitted a white node will do their work as normal and
execute the non-white mutating operations immediately beyond their own node

Some additional checks after the CAS loops to ensure that there really
is someone to pick up the work after a non-white node has been submitted
will be necessary, but those should be fairly simple.

White nodes essentially are a promise to do a bounded amount of
additional work.


This way lazy ops can parasitize other threads, depending on the mixture
of lazy and non-lazy ops anywhere between 50% and 100% of the time.

Additional colors could narrow that range closer to 100%.

In the contended case this should cut down on CAS operations,
allocations and burned CPU cycles by reducing the # of threads doing
redundant work.


That's for the lock-free variant at least, I haven't checked whether the
wait-free one can be extended this way.



- Aaron


On 18.05.2015 21:22, Martin Buchholz wrote:

>
>
> On Sun, May 17, 2015 at 1:13 PM, Aaron Grunthal
> <[hidden email]
> <mailto:[hidden email]>> wrote:

>
>    I think contention in the latency benchmarks in chapter 3.1 be
>    reduced by using weak variants of add()/remove().
>
>    E.g. by only attempting the mutation once and if the CAS fails and
>    there is a next-node in the mutations-queue then the it would know
>    that another thread is also trying to do the same work and it can
>    return immediately knowing that the mutation will be performed
>    eventually.
>
>
> But ... this thread has already done (some of) the work and it would be
> a waste not to share it with other threads, assuming the cost of CAS is
> small compared to the cost of a Mutation.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Pedro Ramalhete
In reply to this post by Pedro Ramalhete
Hi Aaron,

That is an interesting variation, but we should keep in mind that accepting lazy mutations means that such an approach will no longer be fault tolerant (i.e. a thread dying), nor sequentially consistent (which means it's not linearizable).
For example, in such an approach, when wrapping an hashmap, it could happen that the same thread would do hm.put(key1, value1) followed by hm.get(key1) and the return value of the get() would be null, or whatever the previous value was associated with key1.

Moreover, if the thread that has been "delegated" to do the lazy mutation, dies or blocks indefinitely, the work will never get done.Depending on your interpretation of lock-free / wait-free, this can imply that such an approach will not be lock-free nor wait-free.

The main advantage of COWMutationQ is in providing wait-freedom with linearizability (and fault tolerance).


Cheers,
Pedro

     On Monday, May 18, 2015 7:07 PM, Aaron Grunthal <[hidden email]> wrote:


 Right, to avoid most of the work, especially the allocation a lazy
mutation would have to be able to bail out right after enqueuing the
work while knowing that some other thread pick up for it.

It seems pretty difficult to offload without violating the wait-free
properties of the loops after the enqueue.


How about this approach:

* Each node in the queue is colored either black or white.
* Non-lazy ops always submit white nodes.
* Lazy ops will try to submit a black node if the tail is a white node.
? * If submitting a black node is successful they can bail out.
? * If not they have to submit a white node
* Threads that submitted a white node will do their work as normal and
execute the non-white mutating operations immediately beyond their own node

Some additional checks after the CAS loops to ensure that there really
is someone to pick up the work after a non-white node has been submitted
will be necessary, but those should be fairly simple.

White nodes essentially are a promise to do a bounded amount of
additional work.


This way lazy ops can parasitize other threads, depending on the mixture
of lazy and non-lazy ops anywhere between 50% and 100% of the time.

Additional colors could narrow that range closer to 100%.

In the contended case this should cut down on CAS operations,
allocations and burned CPU cycles by reducing the # of threads doing
redundant work.


That's for the lock-free variant at least, I haven't checked whether the
wait-free one can be extended this way.



- Aaron

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Aaron Grunthal
On 19.05.2015 23:05, Pedro Ramalhete wrote:

> Hi Aaron,
>
> That is an interesting variation, but we should keep in mind that
> accepting lazy mutations means that such an approach will no longer be
> fault tolerant (i.e. a thread dying), nor sequentially consistent (which
> means it's not linearizable).
> For example, in such an approach, when wrapping an hashmap, it could
> happen that the same thread would do hm.put(key1, value1) followed by
> hm.get(key1) and the return value of the get() would be null, or
> whatever the previous value was associated with key1.

of course, that's the point of lazy variants to optionally relax
constraints for specific operations if performance would benefit from it.

But it shouldn't be the put() operation that's lazy. A separate
lazyPut() variant would be necessary. It's up to the programmer to
decide if he needs sequential consistency with respect to the current
thread. If you split your work into small tasks and at the end of the
task you perform a lazy put, then yield back to the task scheduler which
then picks up independent task to execute then that other task will not
care about the thread-local ordering because it couldn't know about any
ordering in the first place.

>
> Moreover, if the thread that has been "delegated" to do the lazy
> mutation, dies or blocks indefinitely, the work will never get
> done.Depending on your interpretation of lock-free / wait-free, this can
> imply that such an approach will not be lock-free nor wait-free.
>

isn't the contract of all mutation operations that they have to perform
the same things independent of which thread executes them?

So if any thread were to block indefinitely inside a mutation operation
that precedes the current task in the queue, wouldn't that mean that all
other threads attempting to execute the same work concurrently would
also block indefinitely, essentially blocking all writers arriving on
the work queue? If that were the case the operation might not have
finished in either case, whether it is lazy or not.

Or put differently, shouldn't the whole thing be thread-agnostic?

> The main advantage of COWMutationQ is in providing wait-freedom with
> linearizability (and fault tolerance).
>

I was simply suggesting a way to opt-out of linearizability to some
extent while maintaining wait-freedom.


Although my main motivation was simply that it struck me as wasteful
that the data structure does more redundant work the more contended it
becomes (even if the amount of work is bounded).



- Aaron
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Copy-On-Write with wait-free writes

Martin Buchholz-3
Because Java's memory model has a single global Synchronization order, users expect that arbitrary operations on a shared data structure also have that nice property.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest