About putOrdered and its meaning

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

About putOrdered and its meaning

Romain Colle
Hi all,

I've been recently trying to wrap my head around the meaning of Unsafe.putOrderedXXX() and how it differs from a volatile store.

If I understand correctly, putOrdered shares some guarantees with a volatile store, notably that all the preceding stores from the same thread will be made visible to other threads before the ordered store.
The main difference (from a hardware point of view) seems to be that the ordered store (and preceding ones) are not immediately flushed to main memory and may only be visible locally for a while.
Is that correct?

In this case, do we need anything more to guarantee safe Object publication and happens-before relationships? More specifically:

1) I want to safely publish an Object, i.e. make sure it has been fully built and initialized before making it visible to other threads.
In the absence of final fields (which should be enough by themselves), can I simply use a putOrdered instead of a volatile write?

2) If I want a happens-before relationship before a write and a read:
Consider two variables X and Y. I first store a value 'a' in X and then a value 'b' to Y. I want to make sure that if a thread reads 'b' from Y, the 'a' value of X will be visible to this thread.
The usual and "supported" way to go is to have Y be a volatile variable and perform volatile loads and stores to Y.
However, wouldn't it be enough to simply perform an ordered store and a volatile load?
In java 9, would it also be enough to perform a release store (putObjectRelease) and an acquire load (getObjectAcquire)?

Thanks a lot for your insight!

--
Romain Colle
R&D Project Manager
QuartetFS
46 rue de l'Arbre Sec, 75001 Paris, France
http://www.quartetfs.com

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Aleksey Shipilev-2
On 05/04/2016 11:42 AM, Romain Colle wrote:
> I've been recently trying to wrap my head around the meaning of
> Unsafe.putOrderedXXX() and how it differs from a volatile store.

> If I understand correctly, putOrdered shares some guarantees with a
> volatile store, notably that all the preceding stores from the same
> thread will be made visible to other threads before the ordered store.
> The main difference (from a hardware point of view) seems to be that the
> ordered store (and preceding ones) are not immediately flushed to main
> memory and may only be visible locally for a while.
> Is that correct?

putOrdered is a release in disguise, most of the C++11 std::atomic(...,
mem_order_release) reasoning applies here. "Flushed to main memory" is a
very unhelpful model on current hardware.

There is no way to specify this in the realm of current Java Memory
Model, so explanation should really deviate from it. My own mental model
is this: acquire/release are the relaxations from the usual volatile
rules -- while producing happens-before-s, they drop from total
synchronization order, thus breaking sequential consistency.

In practice, this means at least the absence of total order of
ordered/volatile reads/writes; or, as Javadoc says, the release writes
might not be visible to other threads [in the order you'd expect from
volatile reads/writes -- Edit: me] immediately, until the next
synchronization action happens.

See e.g.:
  http://cs.oswego.edu/pipermail/concurrency-interest/2016-March/015037.html


> In this case, do we need anything more to guarantee safe Object
> publication and happens-before relationships? More specifically:
>
> 1) I want to safely publish an Object, i.e. make sure it has been fully
> built and initialized before making it visible to other threads.
> In the absence of final fields (which should be enough by themselves),
> can I simply use a putOrdered instead of a volatile write?

I think you are conflating safe construction and safe publication there.

Safe publication still works:

                       int x; volatile int y;
-----------------------------------------------------------------------
    put(x, 1);                   |  r1 = get{Acquire|Volatile}(y);
    put{Release|Volatile}(y, 2); |  r2 = get(x);

(r1, r2) = (2, 0) is forbidden.

But anything trickier that requires sequential consistency fails. IRIW
fails, because no consistent write order observed by all threads. Dekker
fails, because release stores followed by loads may or may not be
visible in program order:

                     volatile int x; volatile int y;
-----------------------------------------------------------------------
    putRelease(x, 1);            |    putRelease(y, 1);
    r1 = getAcquire(y);          |    r2 = getAcquire(x);

(r1, r2) = (0, 0) is allowed. Forbidden if all ops are volatile.


Safe construction still does not work (even for volatiles!):

                                A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = global;
    put{Release|Volatile}(a.x, 1);  |  r1 = get{Acquire|Volatile}(a.x);
    global = a;                     |

(r1) = (0) is allowed.


> 2) If I want a happens-before relationship before a write and a read:
> Consider two variables X and Y. I first store a value 'a' in X and then
> a value 'b' to Y. I want to make sure that if a thread reads 'b' from Y,
> the 'a' value of X will be visible to this thread.
> The usual and "supported" way to go is to have Y be a volatile variable
> and perform volatile loads and stores to Y.

See the first example above.

> However, wouldn't it be enough to simply perform an ordered store and a
> volatile load?

Two answers:
 a) Yes, unless you need sequential consistency;
 b) No, unless you can give up sequential consistency;

> In java 9, would it also be enough to perform a release store
> (putObjectRelease) and an acquire load (getObjectAcquire)?

Same two answers.

The bottom-line is that acq/rel are very sharp tools, and their
advantages overcome the maintainability/reasoning downsides only in a
few selected cases and/or on weak memory model hardware platforms that
do not have fast SC primitives.

Thanks,
-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

thurstonn
Aleksey Shipilev-2 wrote
Safe construction still does not work (even for volatiles!):

                                A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = global;
    put{Release|Volatile}(a.x, 1);  |  r1 = get{Acquire|Volatile}(a.x);
    global = a;                     |

(r1) = (0) is allowed.



Thanks,
-Aleksey
Let's change your example slightly:
 A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = getAcquire(global);
    putRelease(a.x, 1);           |   r1 = a.x;
    global = a;                      |

(r1) = (0) is *not* allowed.

Ignoring the possibility of a null pointer in thread on right.

release is a storestore and acquire is a loadload (at least that's my understanding), and if so, then r1 should never be allowed to be 0

Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

thurstonn
thurstonn wrote
Aleksey Shipilev-2 wrote
Safe construction still does not work (even for volatiles!):

                                A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = global;
    put{Release|Volatile}(a.x, 1);  |  r1 = get{Acquire|Volatile}(a.x);
    global = a;                     |

(r1) = (0) is allowed.



Thanks,
-Aleksey
Let's change your example slightly:
 A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = getAcquire(global);
    putRelease(a.x, 1);           |   r1 = a.x;
    global = a;                      |

(r1) = (0) is *not* allowed.

Ignoring the possibility of a null pointer in thread on right.

release is a storestore and acquire is a loadload (at least that's my understanding), and if so, then r1 should never be allowed to be 0
I realize that I'm assuming that the barriers are emitted *after* the respective memory actions, so above code becomes:
 A global;
-----------------------------------------------------------------------
    A a = <alloc>;                  |  A a = global;
    a.x = 1                            |  LoadLoad()
    StoreStore()                     |   r1 = a.x;
    global = a;                      |

Maybe that assumption is wrong?
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Andrew Haley
On 05/04/2016 03:40 PM, thurstonn wrote:

> I realize that I'm assuming that the barriers are emitted *after* the
> respective memory actions, so above code becomes:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = global;
>     a.x = 1                            |  LoadLoad()
>     StoreStore()                     |   r1 = a.x;
>     global = a;                      |
>
> Maybe that assumption is wrong?

StoreRelease is (LoadStore|StoreStore ; store)
LoadAcquire is (load ; LoadStore|LoadLoad)

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Aleksey Shipilev-2
In reply to this post by thurstonn
On 05/04/2016 05:22 PM, thurstonn wrote:
> Let's change your example slightly:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = getAcquire(global);
>     putRelease(a.x, 1);           |   r1 = a.x;
>     global = a;                      |
>
> (r1) = (0) is *not* allowed.

Nope, (0) is still allowed, no matter how hard you try at the reader
side -- the ship on writer side <strike>had already sailed</strike> off
to the races ;) Only finals would preclude (0).

> release is a storestore and acquire is a loadload (at least that's my
> understanding), and if so, then r1 should never be allowed to be 0

Remember how putOrdered says nothing is guaranteed for the subsequent
ops? Assuming it is even fair to talk with barriers at this level (I
hate those), this is how it is subtly different from final fields:

 A a = <alloc>;
 [LoadStore|StoreStore] // putRelease a.x
 a.x = 1;
 global = a; // <--- oopsies, no barriers

(A similar example may be constructed for volatiles, where trailing
StoreLoad does not help either)

...whereas should a.x be final:

 A a = <alloc>;
 a.x = 1;
 [LoadStore|StoreStore] // <--- a mythical "freeze action"
 global = a; // yeah!


Thanks,
-Aleksey



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

thurstonn
Aleksey Shipilev-2 wrote
On 05/04/2016 05:22 PM, thurstonn wrote:
> Let's change your example slightly:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = getAcquire(global);
>     putRelease(a.x, 1);           |   r1 = a.x;
>     global = a;                      |
>
> (r1) = (0) is *not* allowed.

Nope, (0) is still allowed, no matter how hard you try at the reader
side -- the ship on writer side <strike>had already sailed</strike> off
to the races ;) Only finals would preclude (0).

> release is a storestore and acquire is a loadload (at least that's my
> understanding), and if so, then r1 should never be allowed to be 0

Remember how putOrdered says nothing is guaranteed for the subsequent
ops? Assuming it is even fair to talk with barriers at this level (I
hate those), this is how it is subtly different from final fields:

 A a = <alloc>;
 [LoadStore|StoreStore] // putRelease a.x
 a.x = 1;
 global = a; // <--- oopsies, no barriers

(A similar example may be constructed for volatiles, where trailing
StoreLoad does not help either)

...whereas should a.x be final:

 A a = <alloc>;
 a.x = 1;
 [LoadStore|StoreStore] // <--- a mythical "freeze action"
 global = a; // yeah!


Thanks,
-Aleksey



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


signature.asc (853 bytes) <http://jsr166-concurrency.10961.n7.nabble.com/attachment/13435/0/signature.asc>
OK, I think I got it, but surely then the following would work:

Let's change your example slightly (again):
A global;
-----------------------------------------------------------------------
     A a = <alloc>;                  |  A a = getAcquire(global);
     a.x = 1                           |   r1 = a.x;
     putRelease(global, a);        
 
    (r1) = (0) is *not* allowed.

Note: obviously this works if you replace Release/Acquire with volatile

I was confused about the order of the memory actions relative to the respective barriers provided by putRelease, but I think the code with the explicit barriers provides for the r1 = 1 guarantee
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Aleksey Shipilev-2
On 05/04/2016 06:37 PM, thurstonn wrote:

> OK, I think I got it, but surely then the following would work:
>
> Let's change your example slightly (again):
> A global;
> -----------------------------------------------------------------------
>      A a = <alloc>;                  |  A a = getAcquire(global);
>      a.x = 1                           |   r1 = a.x;
>      putRelease(global, a);        
>  
>     (r1) = (0) is *not* allowed.
>
> Note: obviously this works if you replace Release/Acquire with volatile
Yes, you have arrived at my first example under "Safe publication still
works" (it does not matter if a.x is a field, it could be another
variable as well) here:
  http://cs.oswego.edu/pipermail/concurrency-interest/2016-May/015104.html

-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Hans Boehm
In reply to this post by Andrew Haley


On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email]> wrote:
On 05/04/2016 03:40 PM, thurstonn wrote:
> I realize that I'm assuming that the barriers are emitted *after* the
> respective memory actions, so above code becomes:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = global;
>     a.x = 1                            |  LoadLoad()
>     StoreStore()                     |   r1 = a.x;
>     global = a;                      |
>
> Maybe that assumption is wrong?

StoreRelease is (LoadStore|StoreStore ; store)
LoadAcquire is (load ; LoadStore|LoadLoad)

 
But only when that abstraction works :-)

x = 1;
y =release 1;
z = 1;

does not order the stores to x and z.  (Neither in theory nor in practice.)

In the C++ model at least,

Thread 1: y =release 2; x =release 1;

Thread 2: x =release 2; y =release 1;

allows a final state of x = y = 2.  Memory_order_release doesn't mean anything in the absence of a corresponding acquire or consume load.  (Hardware implementations are unlikely to allow that; compiler optimizations might.) Acquire/release make the "message passing" idiom work, not much more than that.



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Romain Colle
In reply to this post by Aleksey Shipilev-2

Thanks a lot Aleksey (and all), that's exactly the explanation I was looking for.
Clear sentences with examples are always very appreciated!

Cheers
Romain

On Wed, May 04, 2016 at 7:29 PM, Aleksey Shipilev <[hidden email]> wrote:

On 05/04/2016 06:37 PM, thurstonn wrote:
> OK, I think I got it, but surely then the following would work:
>
> Let's change your example slightly (again):
> A global;
> -----------------------------------------------------------------------
>      A a = <alloc>;                  |  A a = getAcquire(global);
>      a.x = 1                           |   r1 = a.x;
>      putRelease(global, a);       

>     (r1) = (0) is *not* allowed.
>
> Note: obviously this works if you replace Release/Acquire with volatile

Yes, you have arrived at my first example under "Safe publication still
works" (it does not matter if a.x is a field, it could be another
variable as well) here:
  http://cs.oswego.edu/pipermail/concurrency-interest/2016-May/015104.html

-Aleksey


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

thurstonn
In reply to this post by Hans Boehm
Hans Boehm wrote
On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email]> wrote:


>
> StoreRelease is (LoadStore|StoreStore ; store)
> LoadAcquire is (load ; LoadStore|LoadLoad)
>
>
But only when that abstraction works :-)

x = 1;
y =release 1;
z = 1;

does not order the stores to x and z.  (Neither in theory nor in practice.)



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/concurrency-interest
Just to be clear, do you mean that (in another thread):
r1 = z
LoadLoad
r2 = x

can result in r1 = 1, r2 = 0?
Surely not
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Vitaly Davidovich
Yes it can.  Writer can be reordered as:

z = 1;
x = 1;
y =release 1;

The releasing store to y only orders y and x (assuming reader observes that via an acquire on y), but not z and x.  This is basically the same thing as the constructor+field write earlier in this thread - the LoadLoad in the reader is irrelevant since writer was reordered.


On Wed, May 4, 2016 at 12:56 PM, thurstonn <[hidden email]> wrote:
Hans Boehm wrote
> On Wed, May 4, 2016 at 9:02 AM, Andrew Haley &lt;

> aph@

> &gt; wrote:
>
>
>>
>> StoreRelease is (LoadStore|StoreStore ; store)
>> LoadAcquire is (load ; LoadStore|LoadLoad)
>>
>>
> But only when that abstraction works :-)
>
> x = 1;
> y =release 1;
> z = 1;
>
> does not order the stores to x and z.  (Neither in theory nor in
> practice.)
>
>
>
> _______________________________________________
> Concurrency-interest mailing list

> Concurrency-interest@.oswego

> http://cs.oswego.edu/mailman/concurrency-interest

Just to be clear, do you mean that (in another thread):
r1 = z
LoadLoad
r2 = x

can result in r1 = 1, r2 = 0?
Surely not




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/About-putOrdered-and-its-meaning-tp13429p13440.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Hans Boehm
Yes.  And it probably bears repeating here that by similar reasoning

x = 1;
synchronized(foo) {}
z = 1;

also doesn't order the stores. Historically, it mostly did, but that was a historical accident. It will not in the future.

As usual, none of this matters in the absence of data races.

On Wed, May 4, 2016 at 11:40 AM, Vitaly Davidovich <[hidden email]> wrote:
Yes it can.  Writer can be reordered as:

z = 1;
x = 1;
y =release 1;

The releasing store to y only orders y and x (assuming reader observes that via an acquire on y), but not z and x.  This is basically the same thing as the constructor+field write earlier in this thread - the LoadLoad in the reader is irrelevant since writer was reordered.


On Wed, May 4, 2016 at 12:56 PM, thurstonn <[hidden email]> wrote:
Hans Boehm wrote
> On Wed, May 4, 2016 at 9:02 AM, Andrew Haley &lt;

> aph@

> &gt; wrote:
>
>
>>
>> StoreRelease is (LoadStore|StoreStore ; store)
>> LoadAcquire is (load ; LoadStore|LoadLoad)
>>
>>
> But only when that abstraction works :-)
>
> x = 1;
> y =release 1;
> z = 1;
>
> does not order the stores to x and z.  (Neither in theory nor in
> practice.)
>
>
>
> _______________________________________________
> Concurrency-interest mailing list

> Concurrency-interest@.oswego

> http://cs.oswego.edu/mailman/concurrency-interest

Just to be clear, do you mean that (in another thread):
r1 = z
LoadLoad
r2 = x

can result in r1 = 1, r2 = 0?
Surely not




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/About-putOrdered-and-its-meaning-tp13429p13440.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Peter Levart
In reply to this post by Hans Boehm



On 05/04/2016 07:35 PM, Hans Boehm wrote:


On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email]> wrote:
On 05/04/2016 03:40 PM, thurstonn wrote:
> I realize that I'm assuming that the barriers are emitted *after* the
> respective memory actions, so above code becomes:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = global;
>     a.x = 1                            |  LoadLoad()
>     StoreStore()                     |   r1 = a.x;
>     global = a;                      |
>
> Maybe that assumption is wrong?

StoreRelease is (LoadStore|StoreStore ; store)
LoadAcquire is (load ; LoadStore|LoadLoad)

 
But only when that abstraction works :-)

x = 1;
y =release 1;
z = 1;

does not order the stores to x and z.  (Neither in theory nor in practice.)

In the C++ model at least,

Thread 1: y =release 2; x =release 1;

Thread 2: x =release 2; y =release 1;

allows a final state of x = y = 2.  Memory_order_release doesn't mean anything in the absence of a corresponding acquire or consume load.  (Hardware implementations are unlikely to allow that; compiler optimizations might.) Acquire/release make the "message passing" idiom work, not much more than that.

Ok, but in the presence of store-release / load-acquire pairs, does a single such pair guarantee ordering of other relaxed load/stores that are in program order before store-release to be strictly before load/stores that are in program order after corresponding load-acquire. For example:

Thread1: construct an object graph with relaxed load/stores then publish the reference to data structure via store-release to 'global'

Thread2: load a reference from 'global' via load-acquire then use relaxed load/stores to read/modify the data structure navigated through the reference

Does this guarantee that:
- Thread2 sees all stores performed on data structure by Thread1 before publication
- Thread1 sees no modifications of data structure performed by Thread2 after loading the reference to it


Or, very similar, but not quite the same:

Thread1: process some shared state with relaxed load/stores then store-release a true value into a 'global' flag (that was initially false)

Thread2: after observing 'global' flag read via load-acquire to be true, perform relaxed load/stores of the shared state

Does this guarantee that:
- Thread2 sees all stores performed on shared state by Thread1 before storing true to 'global'
- Thread1 sees no modifications of shared state performed by Thread2 after loading true from 'global'

?

Regards, Peter


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Peter Levart



On 05/04/2016 11:20 PM, Peter Levart wrote:



On 05/04/2016 07:35 PM, Hans Boehm wrote:


On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email]> wrote:
On 05/04/2016 03:40 PM, thurstonn wrote:
> I realize that I'm assuming that the barriers are emitted *after* the
> respective memory actions, so above code becomes:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = global;
>     a.x = 1                            |  LoadLoad()
>     StoreStore()                     |   r1 = a.x;
>     global = a;                      |
>
> Maybe that assumption is wrong?

StoreRelease is (LoadStore|StoreStore ; store)
LoadAcquire is (load ; LoadStore|LoadLoad)

 
But only when that abstraction works :-)

x = 1;
y =release 1;
z = 1;

does not order the stores to x and z.  (Neither in theory nor in practice.)

In the C++ model at least,

Thread 1: y =release 2; x =release 1;

Thread 2: x =release 2; y =release 1;

allows a final state of x = y = 2.  Memory_order_release doesn't mean anything in the absence of a corresponding acquire or consume load.  (Hardware implementations are unlikely to allow that; compiler optimizations might.) Acquire/release make the "message passing" idiom work, not much more than that.

Ok, but in the presence of store-release / load-acquire pairs, does a single such pair guarantee ordering of other relaxed load/stores that are in program order before store-release to be strictly before load/stores that are in program order after corresponding load-acquire. For example:

Thread1: construct an object graph with relaxed load/stores then publish the reference to data structure via store-release to 'global'

Thread2: load a reference from 'global' via load-acquire then use relaxed load/stores to read/modify the data structure navigated through the reference

Does this guarantee that:
- Thread2 sees all stores performed on data structure by Thread1 before publication
- Thread1 sees no modifications of data structure performed by Thread2 after loading the reference to it


Or, very similar, but not quite the same:

Thread1: process some shared state with relaxed load/stores then store-release a true value into a 'global' flag (that was initially false)

Thread2: after observing 'global' flag read via load-acquire to be true, perform relaxed load/stores of the shared state

Does this guarantee that:
- Thread2 sees all stores performed on shared state by Thread1 before storing true to 'global'
- Thread1 sees no modifications of shared state performed by Thread2 after loading true from 'global'

?

Regards, Peter


Ok, I see this already answered by Aleksey. Is this true for Java VarHandles only or for C++ too?

Regards, Peter


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Hans Boehm
The answers are pretty consistent across Java, C, and C++ (and one or two others, notably OpenCL). An acquire load guarantees that all memory effects preceding the corresponding release store are visible (and none of the memory affects following the acquire load are visible before the release store). That's essentially all it guarantees. In my opinion, it's usually best not to think in terms of fences, though fence-based thinking sometimes exposes some useful rough intuitions.

There may be subtle differences/uncertainties as to what happens when an acquire load actually sees the results of a later (in coherence order) store that is not itself ordered. The C++ rules (see "release sequence") predate a modern hardware understanding, and the Java memory model description probably isn't as general as it now needs to be. But these are relatively esoteric issues that typically don't matter.

On Wed, May 4, 2016 at 2:27 PM, Peter Levart <[hidden email]> wrote:



On 05/04/2016 11:20 PM, Peter Levart wrote:



On 05/04/2016 07:35 PM, Hans Boehm wrote:


On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email][hidden email]> wrote:
On 05/04/2016 03:40 PM, thurstonn wrote:
> I realize that I'm assuming that the barriers are emitted *after* the
> respective memory actions, so above code becomes:
>  A global;
> -----------------------------------------------------------------------
>     A a = <alloc>;                  |  A a = global;
>     a.x = 1                            |  LoadLoad()
>     StoreStore()                     |   r1 = a.x;
>     global = a;                      |
>
> Maybe that assumption is wrong?

StoreRelease is (LoadStore|StoreStore ; store)
LoadAcquire is (load ; LoadStore|LoadLoad)

 
But only when that abstraction works :-)

x = 1;
y =release 1;
z = 1;

does not order the stores to x and z.  (Neither in theory nor in practice.)

In the C++ model at least,

Thread 1: y =release 2; x =release 1;

Thread 2: x =release 2; y =release 1;

allows a final state of x = y = 2.  Memory_order_release doesn't mean anything in the absence of a corresponding acquire or consume load.  (Hardware implementations are unlikely to allow that; compiler optimizations might.) Acquire/release make the "message passing" idiom work, not much more than that.

Ok, but in the presence of store-release / load-acquire pairs, does a single such pair guarantee ordering of other relaxed load/stores that are in program order before store-release to be strictly before load/stores that are in program order after corresponding load-acquire. For example:

Thread1: construct an object graph with relaxed load/stores then publish the reference to data structure via store-release to 'global'

Thread2: load a reference from 'global' via load-acquire then use relaxed load/stores to read/modify the data structure navigated through the reference

Does this guarantee that:
- Thread2 sees all stores performed on data structure by Thread1 before publication
- Thread1 sees no modifications of data structure performed by Thread2 after loading the reference to it


Or, very similar, but not quite the same:

Thread1: process some shared state with relaxed load/stores then store-release a true value into a 'global' flag (that was initially false)

Thread2: after observing 'global' flag read via load-acquire to be true, perform relaxed load/stores of the shared state

Does this guarantee that:
- Thread2 sees all stores performed on shared state by Thread1 before storing true to 'global'
- Thread1 sees no modifications of shared state performed by Thread2 after loading true from 'global'

?

Regards, Peter


Ok, I see this already answered by Aleksey. Is this true for Java VarHandles only or for C++ too?

Regards, Peter



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

thurstonn
In reply to this post by Aleksey Shipilev-2
Right.

So given that acquire/releases are not part of the total synchronization order:

 A global;
-----------------------------------------------------------------------
     A a = <alloc>;                  |  A a = getAcquire(global);
      a.x = 1                           |   r1 = a.x;
      putRelease(global, a);        


and let's assume there are 2 reader/acquire threads, and they execute  in absolute time in the following order:

Writer
Reader A (r1 = 1)
. . .
Reader B (a is null)

At least in theory this should be possible?
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Hans Boehm
"absolute time" is not a well-defined notion.  But I think the answer you want here is "yes".

Returning to another well-known litmus test:

Thread 1:
x =release 1;

Thread 2:
y =release 1;

Thread 3:
r1 =acquire x;
r2 =acquire y;

Thread 4:
r3 =acquire y;
r4 =acquire x;

r1 = r3 = 1; r2 = r4 = 0 is a possible outcome.

I.e. the release stores do not need to be observed in a consistent order.  And that's critical to the performance gain on Power and ARM.


On Wed, May 4, 2016 at 3:55 PM, thurstonn <[hidden email]> wrote:
Right.

So given that acquire/releases are not part of the total synchronization
order:

 A global;
-----------------------------------------------------------------------
     A a = <alloc>;                  |  A a = getAcquire(global);
      a.x = 1                           |   r1 = a.x;
      putRelease(global, a);


and let's assume there are 2 reader/acquire threads, and they execute  in
absolute time in the following order:

Writer
Reader A (r1 = 1)
. . .
Reader B (a is null)

At least in theory this should be possible?




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/About-putOrdered-and-its-meaning-tp13429p13446.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Vitaly Davidovich
In reply to this post by thurstonn


On Wednesday, May 4, 2016, thurstonn <[hidden email]> wrote:
Right.

So given that acquire/releases are not part of the total synchronization
order:

 A global;
-----------------------------------------------------------------------
     A a = <alloc>;                  |  A a = getAcquire(global);
      a.x = 1                           |   r1 = a.x;
      putRelease(global, a);


and let's assume there are 2 reader/acquire threads, and they execute  in
absolute time in the following order:

Writer
Reader A (r1 = 1)
. . .
Reader B (a is null)

At least in theory this should be possible?
Yes, that's correct - there's no total order.  In C++11 the seq_cst memory order would be required to prevent this (but it's the most expensive ordering).




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/About-putOrdered-and-its-meaning-tp13429p13446.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;Concurrency-interest@cs.oswego.edu&#39;)">Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


--
Sent from my phone

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: About putOrdered and its meaning

Andrew Haley
In reply to this post by Hans Boehm
On 04/05/16 18:35, Hans Boehm wrote:

> On Wed, May 4, 2016 at 9:02 AM, Andrew Haley <[hidden email]> wrote:
>
>> > On 05/04/2016 03:40 PM, thurstonn wrote:
>>> > > I realize that I'm assuming that the barriers are emitted *after* the
>>> > > respective memory actions, so above code becomes:
>>> > >  A global;
>>> > > -----------------------------------------------------------------------
>>> > >     A a = <alloc>;                  |  A a = global;
>>> > >     a.x = 1                            |  LoadLoad()
>>> > >     StoreStore()                     |   r1 = a.x;
>>> > >     global = a;                      |
>>> > >
>>> > > Maybe that assumption is wrong?
>> >
>> > StoreRelease is (LoadStore|StoreStore ; store)
>> > LoadAcquire is (load ; LoadStore|LoadLoad)
>> >
>> >
> But only when that abstraction works :-)
>
> x = 1;
> y =release 1;
> z = 1;

Aww, I should have said "is approximately".

One of the problems with HotSpot's C2 complier is that it does not
model the JMM internally with happens-before: instead, it inserts
fences immediately after parsing.  All of this works pretty well on
TSO machines, but getting good code on AArch64 has been a challenge.
I would love to fix this, but the complexities of replacing all the
delicate code make things rather difficult.

Andrew.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
12