RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

classic Classic list List threaded Threaded
78 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Martin Buchholz-3
Hi folks,

Review carefully - I am trying to learn about fences by explaining them!
I have borrowed some wording from my reviewers!


https://bugs.openjdk.java.net/browse/JDK-8065804
http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Aleksey Shipilev-2
Hi Martin,

On 11/24/2014 11:56 PM, Martin Buchholz wrote:
> Review carefully - I am trying to learn about fences by explaining them!
> I have borrowed some wording from my reviewers!
>
> https://bugs.openjdk.java.net/browse/JDK-8065804
> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/

I think "implies the effect of C++11" is too strong wording. "related"
might be more appropriate.

See also comments here for connection with "volatiles":
 https://bugs.openjdk.java.net/browse/JDK-8038978

Take note the Hans' correction that fences generally imply more than
volatile load/store, but since you are listing the related things in the
docs, I think the "native" Java example is good to have.

-Aleksey.



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Martin Buchholz-3
OK, I worked in some wording for comparison with volatiles.
I believe you when you say that the semantics of the corresponding C++
fences are slightly different, but it's rather subtle - can we say
anything more than "closely related to"?

On Mon, Nov 24, 2014 at 1:29 PM, Aleksey Shipilev
<[hidden email]> wrote:

> Hi Martin,
>
> On 11/24/2014 11:56 PM, Martin Buchholz wrote:
>> Review carefully - I am trying to learn about fences by explaining them!
>> I have borrowed some wording from my reviewers!
>>
>> https://bugs.openjdk.java.net/browse/JDK-8065804
>> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
>
> I think "implies the effect of C++11" is too strong wording. "related"
> might be more appropriate.
>
> See also comments here for connection with "volatiles":
>  https://bugs.openjdk.java.net/browse/JDK-8038978
>
> Take note the Hans' correction that fences generally imply more than
> volatile load/store, but since you are listing the related things in the
> docs, I think the "native" Java example is good to have.
>
> -Aleksey.
>
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Paul Sandoz
Hi Martin,

Thanks for looking into this.

1141      * Currently hotspot's implementation of a Java language-level volatile
1142      * store has the same effect as a storeFence followed by a relaxed store,
1143      * although that may be a little stronger than needed.

IIUC to emulate hotpot's volatile store you will need to say that a fullFence immediately follows the relaxed store.

The bit that always confuses me about release and acquire is ordering is restricted to one direction, as talked about in orderAccess.hpp [1]. So for a release, accesses prior to the release cannot move below it, but accesses succeeding the release can move above it. And that seems to apply to Unsafe.storeFence [2] (acting like a monitor exit). Is that contrary to C++ release fences where ordering is restricted both to prior and succeeding accesses? [3]

So what about the following?

  a = r1; // Cannot move below the fence
  Unsafe.storeFence();
  b = r2; // Can move above the fence?

Paul.

[1] In orderAccess.hpp
// Execution by a processor of release makes the effect of all memory
// accesses issued by it previous to the release visible to all
// processors *before* the release completes.  The effect of subsequent
// memory accesses issued by it *may* be made visible *before* the
// release.  I.e., subsequent memory accesses may float above the
// release, but prior ones may not float below it.

[2] In memnode.hpp
// "Release" - no earlier ref can move after (but later refs can move
// up, like a speculative pipelined cache-hitting Load).  Requires
// multi-cpu visibility.  Inserted independent of any store, as required
// for intrinsic sun.misc.Unsafe.storeFence().
class StoreFenceNode: public MemBarNode {
public:
  StoreFenceNode(Compile* C, int alias_idx, Node* precedent)
    : MemBarNode(C, alias_idx, precedent) {}
  virtual int Opcode() const;
};

[3] http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/

On Nov 25, 2014, at 1:47 AM, Martin Buchholz <[hidden email]> wrote:

> OK, I worked in some wording for comparison with volatiles.
> I believe you when you say that the semantics of the corresponding C++
> fences are slightly different, but it's rather subtle - can we say
> anything more than "closely related to"?
>
> On Mon, Nov 24, 2014 at 1:29 PM, Aleksey Shipilev
> <[hidden email]> wrote:
>> Hi Martin,
>>
>> On 11/24/2014 11:56 PM, Martin Buchholz wrote:
>>> Review carefully - I am trying to learn about fences by explaining them!
>>> I have borrowed some wording from my reviewers!
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8065804
>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
>>
>> I think "implies the effect of C++11" is too strong wording. "related"
>> might be more appropriate.
>>
>> See also comments here for connection with "volatiles":
>> https://bugs.openjdk.java.net/browse/JDK-8038978
>>
>> Take note the Hans' correction that fences generally imply more than
>> volatile load/store, but since you are listing the related things in the
>> docs, I think the "native" Java example is good to have.
>>
>> -Aleksey.
>>
>>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (858 bytes) Download Attachment
DT
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

DT
I see time to time comments in the jvm sources referencing membars and fences. Would you say that they are used interchangeably ? Having the same meaning but for different CPU arch.

Sent from my iPhone

> On Nov 25, 2014, at 6:04 AM, Paul Sandoz <[hidden email]> wrote:
>
> Hi Martin,
>
> Thanks for looking into this.
>
> 1141      * Currently hotspot's implementation of a Java language-level volatile
> 1142      * store has the same effect as a storeFence followed by a relaxed store,
> 1143      * although that may be a little stronger than needed.
>
> IIUC to emulate hotpot's volatile store you will need to say that a fullFence immediately follows the relaxed store.
>
> The bit that always confuses me about release and acquire is ordering is restricted to one direction, as talked about in orderAccess.hpp [1]. So for a release, accesses prior to the release cannot move below it, but accesses succeeding the release can move above it. And that seems to apply to Unsafe.storeFence [2] (acting like a monitor exit). Is that contrary to C++ release fences where ordering is restricted both to prior and succeeding accesses? [3]
>
> So what about the following?
>
>  a = r1; // Cannot move below the fence
>  Unsafe.storeFence();
>  b = r2; // Can move above the fence?
>
> Paul.
>
> [1] In orderAccess.hpp
> // Execution by a processor of release makes the effect of all memory
> // accesses issued by it previous to the release visible to all
> // processors *before* the release completes.  The effect of subsequent
> // memory accesses issued by it *may* be made visible *before* the
> // release.  I.e., subsequent memory accesses may float above the
> // release, but prior ones may not float below it.
>
> [2] In memnode.hpp
> // "Release" - no earlier ref can move after (but later refs can move
> // up, like a speculative pipelined cache-hitting Load).  Requires
> // multi-cpu visibility.  Inserted independent of any store, as required
> // for intrinsic sun.misc.Unsafe.storeFence().
> class StoreFenceNode: public MemBarNode {
> public:
>  StoreFenceNode(Compile* C, int alias_idx, Node* precedent)
>    : MemBarNode(C, alias_idx, precedent) {}
>  virtual int Opcode() const;
> };
>
> [3] http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/
>
>> On Nov 25, 2014, at 1:47 AM, Martin Buchholz <[hidden email]> wrote:
>>
>> OK, I worked in some wording for comparison with volatiles.
>> I believe you when you say that the semantics of the corresponding C++
>> fences are slightly different, but it's rather subtle - can we say
>> anything more than "closely related to"?
>>
>> On Mon, Nov 24, 2014 at 1:29 PM, Aleksey Shipilev
>> <[hidden email]> wrote:
>>> Hi Martin,
>>>
>>>> On 11/24/2014 11:56 PM, Martin Buchholz wrote:
>>>> Review carefully - I am trying to learn about fences by explaining them!
>>>> I have borrowed some wording from my reviewers!
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8065804
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
>>>
>>> I think "implies the effect of C++11" is too strong wording. "related"
>>> might be more appropriate.
>>>
>>> See also comments here for connection with "volatiles":
>>> https://bugs.openjdk.java.net/browse/JDK-8038978
>>>
>>> Take note the Hans' correction that fences generally imply more than
>>> volatile load/store, but since you are listing the related things in the
>>> docs, I think the "native" Java example is good to have.
>>>
>>> -Aleksey.
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Hans Boehm
It seems to me that a (dubiuously named) loadFence is intended to have essentially the same semantics as the (perhaps slightly less dubiously named) C++ atomic_thread_fence(memory_order_acquire), and a storeFence matches atomic_thread_fence(memory_order_release).  The C++ standard and, even more so, Mark Batty's work have a precise definition of what those mean in terms of implied "synchronizes with" relationships.

It looks to me like this whole implementation model for volatiles in terms of fences is fundamentally doomed, and it probably makes more sense to get rid of it rather than spending time on renaming it (though we just did the latter in Android to avoid similar confusion about semantics).  It's fundamentally incompatible with the way volatiles/atomics are intended to be implemented on ARMv8 (and Itanium).  Which I think fundamentally get this much closer to right than traditional fence-based ISAs.

I'm no hardware architect, but fundamentally it seems to me that

load x
acquire_fence

imposes a much more stringent constraint than

load_acquire x

Consider the case in which the load from x is an L1 hit, but a preceding load (from say y) is a long-latency miss.  If we enforce ordering by just waiting for completion of prior operation, the former has to wait for the load from y to complete; while the latter doesn't.  I find it hard to believe that this doesn't leave an appreciable amount of performance on the table, at least for some interesting microarchitectures.

Along similar lines, it seems to me that Doug's JSR cookbook was a great contribution originally, in that it finally pinned down understandable rules for implementors, while providing a lot of useful intuition.  At this point, it is still quite useful for intuition, but http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html (remembering that Java volatile = C++ SC atomic) is a much better guide for implementors, especially on Power.  The SPARC-like fence primitives used in the cookbook are no longer reflective of the most widely used architectures.  And they do not reflect the fence types actually needed by Java.  In addition, for better or worse, fencing requirements on at least Power are actually driven as much by store atomicity issues, as by the ordering issues discussed in the cookbook.  This was not understood in 2005, and unfortunately doesn't seem to be amenable to the kind of straightforward explanation as in Doug's cookbook.

Hans

On Tue, Nov 25, 2014 at 10:16 AM, DT <[hidden email]> wrote:
I see time to time comments in the jvm sources referencing membars and fences. Would you say that they are used interchangeably ? Having the same meaning but for different CPU arch.

Sent from my iPhone

> On Nov 25, 2014, at 6:04 AM, Paul Sandoz <[hidden email]> wrote:
>
> Hi Martin,
>
> Thanks for looking into this.
>
> 1141      * Currently hotspot's implementation of a Java language-level volatile
> 1142      * store has the same effect as a storeFence followed by a relaxed store,
> 1143      * although that may be a little stronger than needed.
>
> IIUC to emulate hotpot's volatile store you will need to say that a fullFence immediately follows the relaxed store.
>
> The bit that always confuses me about release and acquire is ordering is restricted to one direction, as talked about in orderAccess.hpp [1]. So for a release, accesses prior to the release cannot move below it, but accesses succeeding the release can move above it. And that seems to apply to Unsafe.storeFence [2] (acting like a monitor exit). Is that contrary to C++ release fences where ordering is restricted both to prior and succeeding accesses? [3]
>
> So what about the following?
>
>  a = r1; // Cannot move below the fence
>  Unsafe.storeFence();
>  b = r2; // Can move above the fence?
>
> Paul.
>
> [1] In orderAccess.hpp
> // Execution by a processor of release makes the effect of all memory
> // accesses issued by it previous to the release visible to all
> // processors *before* the release completes.  The effect of subsequent
> // memory accesses issued by it *may* be made visible *before* the
> // release.  I.e., subsequent memory accesses may float above the
> // release, but prior ones may not float below it.
>
> [2] In memnode.hpp
> // "Release" - no earlier ref can move after (but later refs can move
> // up, like a speculative pipelined cache-hitting Load).  Requires
> // multi-cpu visibility.  Inserted independent of any store, as required
> // for intrinsic sun.misc.Unsafe.storeFence().
> class StoreFenceNode: public MemBarNode {
> public:
>  StoreFenceNode(Compile* C, int alias_idx, Node* precedent)
>    : MemBarNode(C, alias_idx, precedent) {}
>  virtual int Opcode() const;
> };
>
> [3] http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/
>
>> On Nov 25, 2014, at 1:47 AM, Martin Buchholz <[hidden email]> wrote:
>>
>> OK, I worked in some wording for comparison with volatiles.
>> I believe you when you say that the semantics of the corresponding C++
>> fences are slightly different, but it's rather subtle - can we say
>> anything more than "closely related to"?
>>
>> On Mon, Nov 24, 2014 at 1:29 PM, Aleksey Shipilev
>> <[hidden email]> wrote:
>>> Hi Martin,
>>>
>>>> On 11/24/2014 11:56 PM, Martin Buchholz wrote:
>>>> Review carefully - I am trying to learn about fences by explaining them!
>>>> I have borrowed some wording from my reviewers!
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8065804
>>>> http://cr.openjdk.java.net/~martin/webrevs/openjdk9/fence-intrinsics/
>>>
>>> I think "implies the effect of C++11" is too strong wording. "related"
>>> might be more appropriate.
>>>
>>> See also comments here for connection with "volatiles":
>>> https://bugs.openjdk.java.net/browse/JDK-8038978
>>>
>>> Take note the Hans' correction that fences generally imply more than
>>> volatile load/store, but since you are listing the related things in the
>>> docs, I think the "native" Java example is good to have.
>>>
>>> -Aleksey.
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Andrew Haley
In reply to this post by Martin Buchholz-3
On 11/24/2014 08:56 PM, Martin Buchholz wrote:
> Hi folks,
>
> Review carefully - I am trying to learn about fences by explaining them!
> I have borrowed some wording from my reviewers!

+     * Currently hotspot's implementation of a Java language-level volatile
+     * store has the same effect as a storeFence followed by a relaxed store,
+     * although that may be a little stronger than needed.

While this may be true today, I'm hopefully about to commit an
AArch64 OpenJDK port that uses the ARMv8 stlr instruction.  I
don't think that what you've written here is terribly misleading,
but bear in mind that it may be there for some time.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Stephan Diestelhorst
In reply to this post by Hans Boehm
Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:

> I'm no hardware architect, but fundamentally it seems to me that
>
> load x
> acquire_fence
>
> imposes a much more stringent constraint than
>
> load_acquire x
>
> Consider the case in which the load from x is an L1 hit, but a preceding
> load (from say y) is a long-latency miss.  If we enforce ordering by just
> waiting for completion of prior operation, the former has to wait for the
> load from y to complete; while the latter doesn't.  I find it hard to
> believe that this doesn't leave an appreciable amount of performance on the
> table, at least for some interesting microarchitectures.

I agree, Hans, that this is a reasonable assumption.  Load_acquire x
does allow roach motel, whereas the acquire fence does not.

>  In addition, for better or worse, fencing requirements on at least
>  Power are actually driven as much by store atomicity issues, as by
>  the ordering issues discussed in the cookbook.  This was not
>  understood in 2005, and unfortunately doesn't seem to be amenable to
>  the kind of straightforward explanation as in Doug's cookbook.

Coming from a strongly ordered architecture to a weakly ordered one
myself, I also needed some mental adjustment about store (multi-copy)
atomicity.  I can imagine others will be unaware of this difference,
too, even in 2014.

Stephan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171:Clarifications/corrections for fence intrinsics

David Holmes-6
Stephan Diestelhorst writes:

>
> Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > I'm no hardware architect, but fundamentally it seems to me that
> >
> > load x
> > acquire_fence
> >
> > imposes a much more stringent constraint than
> >
> > load_acquire x
> >
> > Consider the case in which the load from x is an L1 hit, but a preceding
> > load (from say y) is a long-latency miss.  If we enforce
> ordering by just
> > waiting for completion of prior operation, the former has to
> wait for the
> > load from y to complete; while the latter doesn't.  I find it hard to
> > believe that this doesn't leave an appreciable amount of
> performance on the
> > table, at least for some interesting microarchitectures.
>
> I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> does allow roach motel, whereas the acquire fence does not.
>
> >  In addition, for better or worse, fencing requirements on at least
> >  Power are actually driven as much by store atomicity issues, as by
> >  the ordering issues discussed in the cookbook.  This was not
> >  understood in 2005, and unfortunately doesn't seem to be amenable to
> >  the kind of straightforward explanation as in Doug's cookbook.
>
> Coming from a strongly ordered architecture to a weakly ordered one
> myself, I also needed some mental adjustment about store (multi-copy)
> atomicity.  I can imagine others will be unaware of this difference,
> too, even in 2014.

Sorry I'm missing the connection between fences and multi-copy atomicity.

David

> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171:Clarifications/corrections for fence intrinsics

Stephan Diestelhorst
David Holmes wrote:

> Stephan Diestelhorst writes:
> > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > I'm no hardware architect, but fundamentally it seems to me that
> > >
> > > load x
> > > acquire_fence
> > >
> > > imposes a much more stringent constraint than
> > >
> > > load_acquire x
> > >
> > > Consider the case in which the load from x is an L1 hit, but a
> > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > ordering by just waiting for completion of prior operation, the
> > > former has to wait for the load from y to complete; while the
> > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > an appreciable amount of performance on the table, at least for
> > > some interesting microarchitectures.
> >
> > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > does allow roach motel, whereas the acquire fence does not.
> >
> > >  In addition, for better or worse, fencing requirements on at least
> > >  Power are actually driven as much by store atomicity issues, as by
> > >  the ordering issues discussed in the cookbook.  This was not
> > >  understood in 2005, and unfortunately doesn't seem to be amenable to
> > >  the kind of straightforward explanation as in Doug's cookbook.
> >
> > Coming from a strongly ordered architecture to a weakly ordered one
> > myself, I also needed some mental adjustment about store (multi-copy)
> > atomicity.  I can imagine others will be unaware of this difference,
> > too, even in 2014.
>
> Sorry I'm missing the connection between fences and multi-copy atomicity.

One example is the classic IRIW.  With non-multi copy atomic stores, but
ordered (say through a dependency) loads in the following example:

Memory: foo = bar = 0
_T1_         _T2_         _T3_                              _T4_
st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
                          <addr dep / local "fence" here>   <addr dep>
                          ld r2, (foo)                      ld r4, (bar)

You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
machines.  On TSO boxes, this is not possible.  That means that the
memory fence that will prevent such a behaviour (DMB on ARM) needs to
carry some additional oomph in ensuring multi-copy atomicity, or rather
prevent you from seeing it (which is the same thing).

Stephan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

David Holmes-6
Stephan Diestelhorst writes:

>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David

> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Hans Boehm
To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.


On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:
Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David

> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

David Holmes-6

Hi Hans,
 
Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.
 
Cheers,
David
-----Original Message-----
From: [hidden email] [mailto:[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To: [hidden email]
Cc: Stephan Diestelhorst; [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.


On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:
Stephan Diestelhorst writes:

>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David

> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Roman Elizarov

There is no conceivable way to kill IRIW consistency requirement while retaining ability to prove correctness of large software systems. If IRIW of volatile variables are not consistent, then volatile reads and writes are not linearizable, which breaks linearizabiliy of all higher-level primitives build on top of them and makes formal reasoning about behavior of concurrent systems practically impossible. There are many fields where this is not acceptable.

 

/Roman

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of David Holmes
Sent: Wednesday, November 26, 2014 5:11 AM
To: Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

 

Hi Hans,

 

Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.

 

Cheers,

David

-----Original Message-----
From:
[hidden email] [[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To:
[hidden email]
Cc: Stephan Diestelhorst;
[hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

 

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.

 

 

On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:

Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David


> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

David Holmes-6

Can you expand on that please. All previous discussion of IRIW I have seen indicated that the property, while a consequence of existing JMM rules, had no practical use.
 
Thanks,
David
-----Original Message-----
From: Roman Elizarov [mailto:[hidden email]]
Sent: Wednesday, 26 November 2014 6:49 PM
To: [hidden email]; Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

There is no conceivable way to kill IRIW consistency requirement while retaining ability to prove correctness of large software systems. If IRIW of volatile variables are not consistent, then volatile reads and writes are not linearizable, which breaks linearizabiliy of all higher-level primitives build on top of them and makes formal reasoning about behavior of concurrent systems practically impossible. There are many fields where this is not acceptable.

 

/Roman

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of David Holmes
Sent: Wednesday, November 26, 2014 5:11 AM
To: Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

 

Hi Hans,

 

Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.

 

Cheers,

David

-----Original Message-----
From:
[hidden email] [[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To:
[hidden email]
Cc: Stephan Diestelhorst;
[hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

 

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.

 

 

On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:

Stephan Diestelhorst writes:


>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering

constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David


> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

Roman Elizarov

Whether IRIW has any _practical_ uses is definitely subject to debate. However, there is no tractable way for formal reasoning about properties of large concurrent systems, but via linearizability. Linearizability is the only property that is both local and hierarchical. It lets you build more complex linearizable algorithms from simpler ones, having quite succinct and compelling proofs at each step.

 

In other words, if you want to be able to construct a formal proof that your [large] concurrent system if correct, then you must have IRIW consistency. Do you need a formal proof of correctness? Maybe not. In many applications hand-waving is enough,  but there are many other applications where hand-waving does not count as a proof. It may be possible to construct formal correctness proofs for some very simple algorithms even on a system that does not provide IRIW, but this is beyond the state of the art of formal verification for anything sufficiently complex.

 

/Roman

 

From: David Holmes [mailto:[hidden email]]
Sent: Wednesday, November 26, 2014 11:54 AM
To: Roman Elizarov; Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

 

Can you expand on that please. All previous discussion of IRIW I have seen indicated that the property, while a consequence of existing JMM rules, had no practical use.

 

Thanks,

David

-----Original Message-----
From: Roman Elizarov [
[hidden email]]
Sent: Wednesday, 26 November 2014 6:49 PM
To:
[hidden email]; Hans Boehm
Cc:
[hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

There is no conceivable way to kill IRIW consistency requirement while retaining ability to prove correctness of large software systems. If IRIW of volatile variables are not consistent, then volatile reads and writes are not linearizable, which breaks linearizabiliy of all higher-level primitives build on top of them and makes formal reasoning about behavior of concurrent systems practically impossible. There are many fields where this is not acceptable.

 

/Roman

 

From: [hidden email] [[hidden email]] On Behalf Of David Holmes
Sent: Wednesday, November 26, 2014 5:11 AM
To: Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

 

Hi Hans,

 

Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.

 

Cheers,

David

-----Original Message-----
From:
[hidden email] [[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To:
[hidden email]
Cc: Stephan Diestelhorst;
[hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

 

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.

 

 

On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:

Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David


> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
DT
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

DT
Roman,
Can you point to any specific article providing the concurrency problem statement with a further proof using linearizability to reason about solution.

Thanks,
DT

On 11/26/2014 2:59 AM, Roman Elizarov wrote:

Whether IRIW has any _practical_ uses is definitely subject to debate. However, there is no tractable way for formal reasoning about properties of large concurrent systems, but via linearizability. Linearizability is the only property that is both local and hierarchical. It lets you build more complex linearizable algorithms from simpler ones, having quite succinct and compelling proofs at each step.

 

In other words, if you want to be able to construct a formal proof that your [large] concurrent system if correct, then you must have IRIW consistency. Do you need a formal proof of correctness? Maybe not. In many applications hand-waving is enough,  but there are many other applications where hand-waving does not count as a proof. It may be possible to construct formal correctness proofs for some very simple algorithms even on a system that does not provide IRIW, but this is beyond the state of the art of formal verification for anything sufficiently complex.

 

/Roman

 

From: David Holmes [[hidden email]]
Sent: Wednesday, November 26, 2014 11:54 AM
To: Roman Elizarov; Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

 

Can you expand on that please. All previous discussion of IRIW I have seen indicated that the property, while a consequence of existing JMM rules, had no practical use.

 

Thanks,

David

-----Original Message-----
From: Roman Elizarov [
[hidden email]]
Sent: Wednesday, 26 November 2014 6:49 PM
To:
[hidden email]; Hans Boehm
Cc:
[hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

There is no conceivable way to kill IRIW consistency requirement while retaining ability to prove correctness of large software systems. If IRIW of volatile variables are not consistent, then volatile reads and writes are not linearizable, which breaks linearizabiliy of all higher-level primitives build on top of them and makes formal reasoning about behavior of concurrent systems practically impossible. There are many fields where this is not acceptable.

 

/Roman

 

From: [hidden email] [[hidden email]] On Behalf Of David Holmes
Sent: Wednesday, November 26, 2014 5:11 AM
To: Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

 

Hi Hans,

 

Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.

 

Cheers,

David

-----Original Message-----
From:
[hidden email] [[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To:
[hidden email]
Cc: Stephan Diestelhorst;
[hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

 

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.

 

 

On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:

Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David


> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Hans Boehm
In reply to this post by David Holmes-6
Definitions here seem to be less clear than I would like.  What I meant by "store atomicity", which I think is more or less synonymous with "multi-copy atomicity" is that a store becomes visible to all observers at the same time, or equivalently stores become visible to all observers in a consistent order.  In my view, IRIW is the canonical test for that.

I agree with Roman that IRIW requirements for Java volatiles are here to stay.  Many of us thought about ways to relax the requirement about 8 or 9 years ago.  In my view:

- Sequential consistency for data-race-free code is the only model that we can possibly explain to the majority of programmers.  (Even stronger models like some notions of region serializability may also make sense, but they'll cost you.)  This requires IRIW.  This model is also by far the easiest to reason about formally.

- The next weaker model that seems to be somewhat explainable, but really only to experts, is something along the lines of the C++ acquire/release model.  This doesn't require IRIW.  It's clearly too weak to replace Java volatile behavior, since it also fails to work for Dekkers-like settings, which are fairly common.  (Nonexperts perhaps shouldn't write lock-free Dekkers-like code, but it's hard to explain precisely what they shouldn't be doing.)

- A large amount of effort to generate models between those two failed to generate anything viable.  The general experience was that once you no longer require IRIW, you also end up failing various other, potentially more important, litmus tests in ways that are really difficult to explain.  And those models generally looked too complex to me to form a viable basis for real programs

I think many people, even those who would rather not enforce IRIW, generally agree with this characterization.

Hans

On Tue, Nov 25, 2014 at 6:10 PM, David Holmes <[hidden email]> wrote:
Hi Hans,
 
Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.
 
Cheers,
David
-----Original Message-----
From: [hidden email] [mailto:[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To: [hidden email]
Cc: Stephan Diestelhorst; [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.


On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:
Stephan Diestelhorst writes:

>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David

> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

Roman Elizarov
In reply to this post by DT

I'd suggest to start with the original paper by Herlihy who had come up with the concept of Linearizability in 1990:

Linearizability: A Correctness Condition for Concurrent Objects 

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.5315


There were lot of reasearch about linearizability since then (there are almost a thouthand citations for this arcticle) expanding and improving proof techniquies and applying it. There were no breakthroughs of the comparable magnitude since then. All "thread-safe" objects that you enconter in the modern word are linearizable. It is a defacto "golden standard" correctness condition for concurrent objects. 


This position is well deserved, because having lineariazable objects as your building blocks makes it super-easy to formally reason about correctness of your code. You will rarely encouter concurrent algorithms that provide weaker guarantees (like quescient consistency), because they all too hard to reason about -- they are either not composable or not local. But when all your concurrent objects are linearizable, then you can ditch happens-before, forget that everything is actually parallel and simply reason about your code in terms of interleaving of "atomic" operations that happen in some global order. That is the beauty of linearizability. 


But Linearizability is indeed a pretty strong requirement. Linearizability of your shared memory requires that Independent Reads of Independent Writes (IRIW) are consistent. Can you get away with some weaker requirement and still get all the same goodies as linearizability gets you? I have not seen anything promising in this direction. Whoever makes this breakthrough will surely reap the world's recognition and respect.


/Roman



От: DT <[hidden email]>
Отправлено: 26 ноября 2014 г. 20:24
Кому: Roman Elizarov; [hidden email]; Hans Boehm
Копия: core-libs-dev; [hidden email]
Тема: Re: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics
 
Roman,
Can you point to any specific article providing the concurrency problem statement with a further proof using linearizability to reason about solution.

Thanks,
DT

On 11/26/2014 2:59 AM, Roman Elizarov wrote:

Whether IRIW has any _practical_ uses is definitely subject to debate. However, there is no tractable way for formal reasoning about properties of large concurrent systems, but via linearizability. Linearizability is the only property that is both local and hierarchical. It lets you build more complex linearizable algorithms from simpler ones, having quite succinct and compelling proofs at each step.

 

In other words, if you want to be able to construct a formal proof that your [large] concurrent system if correct, then you must have IRIW consistency. Do you need a formal proof of correctness? Maybe not. In many applications hand-waving is enough,  but there are many other applications where hand-waving does not count as a proof. It may be possible to construct formal correctness proofs for some very simple algorithms even on a system that does not provide IRIW, but this is beyond the state of the art of formal verification for anything sufficiently complex.

 

/Roman

 

From: David Holmes [[hidden email]]
Sent: Wednesday, November 26, 2014 11:54 AM
To: Roman Elizarov; Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

 

Can you expand on that please. All previous discussion of IRIW I have seen indicated that the property, while a consequence of existing JMM rules, had no practical use.

 

Thanks,

David

-----Original Message-----
From: Roman Elizarov [
[hidden email]]
Sent: Wednesday, 26 November 2014 6:49 PM
To:
[hidden email]; Hans Boehm
Cc:
[hidden email]; core-libs-dev
Subject: RE: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

There is no conceivable way to kill IRIW consistency requirement while retaining ability to prove correctness of large software systems. If IRIW of volatile variables are not consistent, then volatile reads and writes are not linearizable, which breaks linearizabiliy of all higher-level primitives build on top of them and makes formal reasoning about behavior of concurrent systems practically impossible. There are many fields where this is not acceptable.

 

/Roman

 

From: [hidden email] [[hidden email]] On Behalf Of David Holmes
Sent: Wednesday, November 26, 2014 5:11 AM
To: Hans Boehm
Cc: [hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

 

Hi Hans,

 

Given IRIW is a thorn in everyone's side and has no known useful benefit, and can hopefully be killed off in the future, lets not get bogged down in IRIW. But none of what you say below relates to multi-copy-atomicity.

 

Cheers,

David

-----Original Message-----
From:
[hidden email] [[hidden email]]On Behalf Of Hans Boehm
Sent: Wednesday, 26 November 2014 12:04 PM
To:
[hidden email]
Cc: Stephan Diestelhorst;
[hidden email]; core-libs-dev
Subject: Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

To be concrete here, on Power, loads can normally be ordered by an address dependency or light-weight fence (lwsync).  However, neither is enough to prevent the questionable outcome for IRIW, since it doesn't ensure that the stores in T1 and T2 will be made visible to other threads in a consistent order.  That outcome can be prevented by using heavyweight fences (sync) instructions between the loads instead.  Peter Sewell's group concluded that to enforce correct volatile behavior on Power, you essentially need a a heavyweight fence between every pair of volatile operations on Power.  That cannot be understood based on simple ordering constraints.

 

As Stephan pointed out, there are similar issues on ARM, but they're less commonly encountered in a Java implementation.  If you're lucky, you can get to the right implementation recipe by looking at only reordering, I think.

 

 

On Tue, Nov 25, 2014 at 4:36 PM, David Holmes <[hidden email]> wrote:

Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36 schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x is an L1 hit, but a
> > > > preceding load (from say y) is a long-latency miss.  If we enforce
> > > > ordering by just waiting for completion of prior operation, the
> > > > former has to wait for the load from y to complete; while the
> > > > latter doesn't.  I find it hard to believe that this doesn't leave
> > > > an appreciable amount of performance on the table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable assumption.  Load_acquire x
> > > does allow roach motel, whereas the acquire fence does not.
> > >
> > > >  In addition, for better or worse, fencing requirements on at least
> > > >  Power are actually driven as much by store atomicity issues, as by
> > > >  the ordering issues discussed in the cookbook.  This was not
> > > >  understood in 2005, and unfortunately doesn't seem to be
> amenable to
> > > >  the kind of straightforward explanation as in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to a weakly ordered one
> > > myself, I also needed some mental adjustment about store (multi-copy)
> > > atomicity.  I can imagine others will be unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences and multi-copy
> atomicity.
>
> One example is the classic IRIW.  With non-multi copy atomic stores, but
> ordered (say through a dependency) loads in the following example:
>
> Memory: foo = bar = 0
> _T1_         _T2_         _T3_                              _T4_
> st (foo),1   st (bar),1   ld r1, (bar)                      ld r3,(foo)
>                           <addr dep / local "fence" here>   <addr dep>
>                           ld r2, (foo)                      ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on non-multi-copy atomic
> machines.  On TSO boxes, this is not possible.  That means that the
> memory fence that will prevent such a behaviour (DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy atomicity, or rather
> prevent you from seeing it (which is the same thing).

I take it as given that any code for which you may have ordering
constraints, must first have basic atomicity properties for loads and
stores. I would not expect any kind of fence to add multi-copy-atomicity
where there was none.

David


> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Martin Buchholz-3
In reply to this post by Martin Buchholz-3
On Wed, Nov 26, 2014 at 5:08 PM, David Holmes <[hidden email]> wrote:
> Please explain why you have changed the defined semantics for storeFence.
> You have completely reversed the direction of the barrier.

Yes.  I believe the current spec of storeFence was a copy-paste typo,
and it seems others feel likewise.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
1234