DCL using Fence Intrinsics

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

DCL using Fence Intrinsics

vikas
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (Code1)
 
     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
          U.loadFence();
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       U.storeFence();
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    Code1
     
    Will the above Code1 works?
   
    ------------------------------------------------------------------------------
 
    On similar lines i have another doubt. See below Code2.
    if  a and b are normal variables with initial value 0

       T1                                                     T2
     a = 1;                                      while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // will always pass
                                       
                                     Code2
   
    Code2 works because putXXXOrdered and getXXXVolatile forms a happens before edge.
    i.e. assert in Thread T2 will always pass.

    -------------------------------------------------------------------------------
    But can we say the same thing for below code (Code3)
   
       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     Code3

   What  prevents the compiler to optimize the while loop in Code3 to an infinte loop.
   So does Code3 works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas
 
   

 
   
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.

sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

vikas
In 2, compiler can move a=1 after the loop

Not sure what you meant here a==1 is a read operation and already after the loop

For 3, if you put loadFence inside the while loop it will work
 Not sure why it will work

I can't see why you even need the loadFence.
Probably without load fence you may not see all fields of Singleton fully initialized.
There is not happens before relation between storeFence and reading of instance variable.

Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

vikas
  >>For 3, if you put loadFence inside the while loop it will work 

  Yahh i got it, yes it will work if i put loadFence inside the loop
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
In reply to this post by vikas

I'm talking about the assignment of 1 to a done by T1.  The formatting is a bit off, so perhaps I'm misreading it.

In 1, the writer has a storeFence between constructor and assignment to field which prevents reader from seeing the reference before initializing stores are complete.  So, I think the reader sees either null or a fully constructed object without loadFence.  This is basically mimicing what happens when you publish an instance with final fields racily.

sent from my phone

On Mar 12, 2015 8:21 PM, "vikas" <[hidden email]> wrote:
*In 2, compiler can move a=1 after the loop*

Not sure what you meant here a==1 is a read operation and already after the
loop

*For 3, if you put loadFence inside the while loop it will work*
 Not sure why it will work

*I can't see why you even need the loadFence.*
Probably without load fence you may not see all fields of Singleton fully
initialized.
There is not happens before relation between storeFence and reading of
instance variable.





--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12422.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
In reply to this post by vikas
I think there is a formatting issue. The loop for Code 2 appears in T1,
but you probably meant T2.

Alex

On 12/03/2015 23:48, vikas wrote:

> *In 2, compiler can move a=1 after the loop*
>
> Not sure what you meant here a==1 is a read operation and already after the
> loop
>
> *For 3, if you put loadFence inside the while loop it will work*
>   Not sure why it will work
>
> *I can't see why you even need the loadFence.*
> Probably without load fence you may not see all fields of Singleton fully
> initialized.
> There is not happens before relation between storeFence and reading of
> instance variable.
>
>
>
>
>
> --
> View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12422.html
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
In reply to this post by Vitaly Davidovich
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex

sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex

On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
So I thought it might be shady, but I can't come up with a *legitimate* case where it breaks.  One possibility is following reordering:

else {
    do {
       U.loadFence();
        // sink the 'a' read into here, it's still 0, then 'b' reads 1 and we break
    }while(b!=1);  

I can't immediately see why such a transformation would take place because for compiler to do that, it would have to prove that the loop always executes only once (otherwise it's moving a load ahead of a loadFence).  It's also making a loop invariant read into a variant one.  I guess it could clone the code into 2 separate versions, one for looping and one for not, but seems weird and useless.  I suppose CPU could speculate somehow here, but again, not immediately clear to me why it would speculate ahead of 'b' when 'b' is read possibly many times and 'a' is read just once.

But you're right, this "trick" isn't reliable.



On Fri, Mar 13, 2015 at 10:33 AM, Oleksandr Otenko <[hidden email]> wrote:
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex


On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest






_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
First you need to justify the saving of loadFence. You can't assume the saving is significant (first it must be predictable) and at the same time assume the load of a / breaking the loop is not predictable.

Alex

On 13/03/2015 14:55, Vitaly Davidovich wrote:
So I thought it might be shady, but I can't come up with a *legitimate* case where it breaks.  One possibility is following reordering:

else {
    do {
       U.loadFence();
        // sink the 'a' read into here, it's still 0, then 'b' reads 1 and we break
    }while(b!=1);  

I can't immediately see why such a transformation would take place because for compiler to do that, it would have to prove that the loop always executes only once (otherwise it's moving a load ahead of a loadFence).  It's also making a loop invariant read into a variant one.  I guess it could clone the code into 2 separate versions, one for looping and one for not, but seems weird and useless.  I suppose CPU could speculate somehow here, but again, not immediately clear to me why it would speculate ahead of 'b' when 'b' is read possibly many times and 'a' is read just once.

But you're right, this "trick" isn't reliable.



On Fri, Mar 13, 2015 at 10:33 AM, Oleksandr Otenko <[hidden email]> wrote:
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex


On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest







_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich

Saving is easy to justify: it's at best a heavy compiler fence and at worst both compiler and cpu.  All loop optimizers (generally) try to move loop invariant stuff out, not in.

sent from my phone

On Mar 13, 2015 11:03 AM, "Oleksandr Otenko" <[hidden email]> wrote:
First you need to justify the saving of loadFence. You can't assume the saving is significant (first it must be predictable) and at the same time assume the load of a / breaking the loop is not predictable.

Alex

On 13/03/2015 14:55, Vitaly Davidovich wrote:
So I thought it might be shady, but I can't come up with a *legitimate* case where it breaks.  One possibility is following reordering:

else {
    do {
       U.loadFence();
        // sink the 'a' read into here, it's still 0, then 'b' reads 1 and we break
    }while(b!=1);  

I can't immediately see why such a transformation would take place because for compiler to do that, it would have to prove that the loop always executes only once (otherwise it's moving a load ahead of a loadFence).  It's also making a loop invariant read into a variant one.  I guess it could clone the code into 2 separate versions, one for looping and one for not, but seems weird and useless.  I suppose CPU could speculate somehow here, but again, not immediately clear to me why it would speculate ahead of 'b' when 'b' is read possibly many times and 'a' is read just once.

But you're right, this "trick" isn't reliable.



On Fri, Mar 13, 2015 at 10:33 AM, Oleksandr Otenko <[hidden email]> wrote:
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex


On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest







_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
No, you look at it wrong way around.

Compare the effort of proving the correctness of the resulting code to the performance improvement.

Alex


On 13/03/2015 15:06, Vitaly Davidovich wrote:

Saving is easy to justify: it's at best a heavy compiler fence and at worst both compiler and cpu.  All loop optimizers (generally) try to move loop invariant stuff out, not in.

sent from my phone

On Mar 13, 2015 11:03 AM, "Oleksandr Otenko" <[hidden email]> wrote:
First you need to justify the saving of loadFence. You can't assume the saving is significant (first it must be predictable) and at the same time assume the load of a / breaking the loop is not predictable.

Alex

On 13/03/2015 14:55, Vitaly Davidovich wrote:
So I thought it might be shady, but I can't come up with a *legitimate* case where it breaks.  One possibility is following reordering:

else {
    do {
       U.loadFence();
        // sink the 'a' read into here, it's still 0, then 'b' reads 1 and we break
    }while(b!=1);  

I can't immediately see why such a transformation would take place because for compiler to do that, it would have to prove that the loop always executes only once (otherwise it's moving a load ahead of a loadFence).  It's also making a loop invariant read into a variant one.  I guess it could clone the code into 2 separate versions, one for looping and one for not, but seems weird and useless.  I suppose CPU could speculate somehow here, but again, not immediately clear to me why it would speculate ahead of 'b' when 'b' is read possibly many times and 'a' is read just once.

But you're right, this "trick" isn't reliable.



On Fri, Mar 13, 2015 at 10:33 AM, Oleksandr Otenko <[hidden email]> wrote:
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex


On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest








_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
As I mentioned a few replies earlier, I'm not advocating this.  What I was trying to establish, purely for educational purpose, is a real example of either compiler and/or cpu transform that would invalidate it.

On Fri, Mar 13, 2015 at 12:28 PM, Oleksandr Otenko <[hidden email]> wrote:
No, you look at it wrong way around.

Compare the effort of proving the correctness of the resulting code to the performance improvement.

Alex



On 13/03/2015 15:06, Vitaly Davidovich wrote:

Saving is easy to justify: it's at best a heavy compiler fence and at worst both compiler and cpu.  All loop optimizers (generally) try to move loop invariant stuff out, not in.

sent from my phone

On Mar 13, 2015 11:03 AM, "Oleksandr Otenko" <[hidden email]> wrote:
First you need to justify the saving of loadFence. You can't assume the saving is significant (first it must be predictable) and at the same time assume the load of a / breaking the loop is not predictable.

Alex

On 13/03/2015 14:55, Vitaly Davidovich wrote:
So I thought it might be shady, but I can't come up with a *legitimate* case where it breaks.  One possibility is following reordering:

else {
    do {
       U.loadFence();
        // sink the 'a' read into here, it's still 0, then 'b' reads 1 and we break
    }while(b!=1);  

I can't immediately see why such a transformation would take place because for compiler to do that, it would have to prove that the loop always executes only once (otherwise it's moving a load ahead of a loadFence).  It's also making a loop invariant read into a variant one.  I guess it could clone the code into 2 separate versions, one for looping and one for not, but seems weird and useless.  I suppose CPU could speculate somehow here, but again, not immediately clear to me why it would speculate ahead of 'b' when 'b' is read possibly many times and 'a' is read just once.

But you're right, this "trick" isn't reliable.



On Fri, Mar 13, 2015 at 10:33 AM, Oleksandr Otenko <[hidden email]> wrote:
No, you have just shown that you don't need a loadFence after the loop, which is wrong.

You need a loadFence between the last load of b and the load of a, to preserve the order of loading a after loading b. Then you need a loadFence between loads of b, so you keep re-loading b on each iteration.

Alex


On 13/03/2015 14:23, Vitaly Davidovich wrote:
btw, for #3, you'd probably want to rewrite T2 as:

if (b==1) {
   U.loadFence();
} else {
    do {
       U.loadFence();
    }while(b!=1);  
}

assert(a==1);

This would avoid an additional load fence upon exiting the while loop (if the while loop was actually entered).


On Fri, Mar 13, 2015 at 10:10 AM, Vitaly Davidovich <[hidden email]> wrote:
Yeah, I read #2 as the while loop being in T1, but if it's T2, then yes, it's fine and will work.

Thanks for clarifying #3 -- I meant to keep existing code as is but stuff a loadFence into the loop, but re-reading my reply, I do see how it can be interpreted as moving the existing one.

On Fri, Mar 13, 2015 at 9:50 AM, Oleksandr Otenko <[hidden email]> wrote:
On 12/03/2015 23:01, Vitaly Davidovich wrote:

1 works, and I can't see why you even need the loadFence.

2 and 3 won't (always) work.  In 2, compiler can move a=1 after the loop.  For 3, if you put loadFence inside the while loop it will work.


If we assume the loop in 2 was meant to be in T2, then it will work.

For 3, you need to have loadFence inside the loop and after the loop.

Alex


sent from my phone

On Mar 12, 2015 6:43 PM, "vikas" <[hidden email]> wrote:
Hi,

  I am trying to understand the fence intrinsic api.
  Pershing has showw how to write DCL in C++ in his blog
  http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

  I was trying to have a similar thing in Java (*Code1*)

     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;
         * U.loadFence();*
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       *U.storeFence();*
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }
                                    *Code1*

   * Will the above Code1 works? *


------------------------------------------------------------------------------

    On similar lines i have another doubt. See below *Code2*.
    if * a* and *b* are normal variables with initial value 0

       T1                                                     T2
     a = 1;
while(unsafe.getIntVolatile(b)!=1);
     unsafe.putIntOrdered(b,1);         assert(a==1); // *will always pass*

                                     *Code2*

    Code2 works because putXXXOrdered and getXXXVolatile forms a happens
before edge.
    i.e. assert in Thread T2 will always pass.


-------------------------------------------------------------------------------
    But can we say the same thing for below code (*Code3*)

       T1                                                        T2
     a = 1;                                               while(b!=1);
     unsafe.storeFence();                           unsafe.loadFence();
     b = 1;                                               assert(a==1);
                                     *Code3*

  * /What  prevents the compiler to optimize the while loop in *Code3* to an
infinte loop./*
   So does *Code3 *works? If not, then is there anyway we can achieve the
   expected behavior using fences.

   thanks
   vikas








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest









_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

vikas
Thanks Vitaly,

and sorry for the improper formatting.

on the second note i was wondering why i wouldn't need loadFence in *Code1* DCL Example

JMM cookbook suggest to insert LoadLoad barrier before final field access (in processor where data dependency is not respected), my example of DCL added LoadFence only because of this.

Also C++ example does need both the fences
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

I can think of one reason on why It may work in java without LoadFence is benign data race-like construct
are kind of allowed in Java whereas they are not allowed in C++.

Also so below Code4 works for DCL Singleton pattern ?


                                                    Code4
 
     sun.misc.Unsafe U;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;  // no fence while reading
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                       U.storeFence(); // only need StoreFence
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }


 

Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
Vitaly is wrong. The loadFence in Code1 is needed. Without it, it is
possible to access the uninitialized fields of the singleton. (the loads
may occur before the load of instance)


Alex


On 13/03/2015 17:27, vikas wrote:

> Thanks Vitaly,
>
> and sorry for the improper formatting.
>
> on the second note i was wondering why i wouldn't need loadFence in *Code1*
> DCL Example
>
> JMM cookbook suggest to insert LoadLoad barrier before final field access
> (in processor where data dependency is not respected), my example of DCL
> added LoadFence only because of this.
>
> Also C++ example does need both the fences
> http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
>
> I can think of one reason on why It may work in java without LoadFence is
> benign data race-like construct
> are kind of allowed in Java whereas they are not allowed in C++.
>
> Also so below *Code4* works for DCL Singleton pattern ?
>
>
>                                                      *Code4*
>    
>       sun.misc.Unsafe *U*;
>       Singleton instance = null
>
>       Singleton getInstance() {
>            Singleton tmp = instance;  // no fence while reading
>            if(tmp == null) {
>                synchronized(Singleton.class) {
>                     tmp = instance;
>                     if(tmp == null) {
>                         tmp = new Singleton();
>                        * U.storeFence();* // only need StoreFence
>                         instance = tmp;
>                    }
>                }
>             }
>         return tmp;
>       }
>
>
>  
>
>
>
>
>
> --
> View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12435.html
> Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
In reply to this post by vikas
JMM cookbook suggest to insert LoadLoad barrier before final field access
(in processor where data dependency is not respected), my example of DCL
added LoadFence only because of this.

Right, the only such processor (that doesn't respect indirection) I've heard of is Alpha, but AFAIK, that's not a supported platform (for Oracle Hotspot, at least).

Also C++ example does need both the fences
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ 

I can think of one reason on why It may work in java without LoadFence is
benign data race-like construct
are kind of allowed in Java whereas they are not allowed in C++. 

Also so below *Code4* works for DCL Singleton pattern ?

One aspect is that java disallows introducing phantom reads when you've loaded a piece of memory into a temp and are using the temp -- C++ allows for this, which breaks racy attempts that would succeed in java.

I could be wrong, but my thinking is that Code4 is basically what happens when you invoke a constructor with final fields (as I mentioned before) and then publish the reference racily.  Technically speaking, if you were to take things like the Alpha into account, you most definitely would need a load fence.  The one doubt in my head is in this snippet of your code:

tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                      * U.storeFence();* // only need StoreFence
                       instance = tmp;
                  }


It's theoretically conceivable for a compiler to realize that it doesn't need to store to 'tmp' here and just store to the field directly.  If there were no storeFence() there, then that definitely isn't right (it's basically the broken DCL scenario).  With the storeFence() placed where it is, there's a clear (in my mind, at least) barrier between where allocation+construction is done and field assignment.  If that's true, and taking things like Alpha out of the equation, I believe you don't need the loadFence.

On Fri, Mar 13, 2015 at 1:27 PM, vikas <[hidden email]> wrote:
Thanks Vitaly,

and sorry for the improper formatting.

on the second note i was wondering why i wouldn't need loadFence in *Code1*
DCL Example

JMM cookbook suggest to insert LoadLoad barrier before final field access
(in processor where data dependency is not respected), my example of DCL
added LoadFence only because of this.

Also C++ example does need both the fences
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

I can think of one reason on why It may work in java without LoadFence is
benign data race-like construct
are kind of allowed in Java whereas they are not allowed in C++.

Also so below *Code4* works for DCL Singleton pattern ?


                                                    *Code4*

     sun.misc.Unsafe *U*;
     Singleton instance = null

     Singleton getInstance() {
          Singleton tmp = instance;  // no fence while reading
          if(tmp == null) {
              synchronized(Singleton.class) {
                   tmp = instance;
                   if(tmp == null) {
                       tmp = new Singleton();
                      * U.storeFence();* // only need StoreFence
                       instance = tmp;
                  }
              }
           }
       return tmp;
     }








--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12435.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

Vitaly Davidovich
In reply to this post by oleksandr otenko
I mentioned that in my previous reply, but I'm not aware of any JVM running on platforms that allow such reordering.  I also highly doubt that such a platform would ever be ported to as bug tail would be very long, along with JVM having to insert LoadLoad barriers in lots of places where refs are read of classes with at least one final field.  If you have a concrete/real/practical example of where this reordering can take place, I'd love to know about it.

On Fri, Mar 13, 2015 at 2:38 PM, Oleksandr Otenko <[hidden email]> wrote:
Vitaly is wrong. The loadFence in Code1 is needed. Without it, it is possible to access the uninitialized fields of the singleton. (the loads may occur before the load of instance)


Alex



On 13/03/2015 17:27, vikas wrote:
Thanks Vitaly,

and sorry for the improper formatting.

on the second note i was wondering why i wouldn't need loadFence in *Code1*
DCL Example

JMM cookbook suggest to insert LoadLoad barrier before final field access
(in processor where data dependency is not respected), my example of DCL
added LoadFence only because of this.

Also C++ example does need both the fences
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

I can think of one reason on why It may work in java without LoadFence is
benign data race-like construct
are kind of allowed in Java whereas they are not allowed in C++.

Also so below *Code4* works for DCL Singleton pattern ?


                                                     *Code4*
         sun.misc.Unsafe *U*;
      Singleton instance = null

      Singleton getInstance() {
           Singleton tmp = instance;  // no fence while reading
           if(tmp == null) {
               synchronized(Singleton.class) {
                    tmp = instance;
                    if(tmp == null) {
                        tmp = new Singleton();
                       * U.storeFence();* // only need StoreFence
                        instance = tmp;
                   }
               }
            }
        return tmp;
      }


 




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12435.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: DCL using Fence Intrinsics

oleksandr otenko
Wasn't there a recent thread with a reference to a platform which can load even dependent data out of order?

http://en.wikipedia.org/wiki/Memory_ordering
  • Dependent loads can be reordered (this is unique for Alpha). If the processor fetches a pointer to some data after this reordering, it might not fetch the data itself but use stale data which it has already cached and not yet invalidated. Allowing this relaxation makes cache hardware simpler and faster but leads to the requirement of memory barriers for readers and writers.[5]


Alex

On 13/03/2015 19:31, Vitaly Davidovich wrote:
I mentioned that in my previous reply, but I'm not aware of any JVM running on platforms that allow such reordering.  I also highly doubt that such a platform would ever be ported to as bug tail would be very long, along with JVM having to insert LoadLoad barriers in lots of places where refs are read of classes with at least one final field.  If you have a concrete/real/practical example of where this reordering can take place, I'd love to know about it.

On Fri, Mar 13, 2015 at 2:38 PM, Oleksandr Otenko <[hidden email]> wrote:
Vitaly is wrong. The loadFence in Code1 is needed. Without it, it is possible to access the uninitialized fields of the singleton. (the loads may occur before the load of instance)


Alex



On 13/03/2015 17:27, vikas wrote:
Thanks Vitaly,

and sorry for the improper formatting.

on the second note i was wondering why i wouldn't need loadFence in *Code1*
DCL Example

JMM cookbook suggest to insert LoadLoad barrier before final field access
(in processor where data dependency is not respected), my example of DCL
added LoadFence only because of this.

Also C++ example does need both the fences
http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/

I can think of one reason on why It may work in java without LoadFence is
benign data race-like construct
are kind of allowed in Java whereas they are not allowed in C++.

Also so below *Code4* works for DCL Singleton pattern ?


                                                     *Code4*
         sun.misc.Unsafe *U*;
      Singleton instance = null

      Singleton getInstance() {
           Singleton tmp = instance;  // no fence while reading
           if(tmp == null) {
               synchronized(Singleton.class) {
                    tmp = instance;
                    if(tmp == null) {
                        tmp = new Singleton();
                       * U.storeFence();* // only need StoreFence
                        instance = tmp;
                   }
               }
            }
        return tmp;
      }


 




--
View this message in context: http://jsr166-concurrency.10961.n7.nabble.com/DCL-using-Fence-Intrinsics-tp12420p12435.html
Sent from the JSR166 Concurrency mailing list archive at Nabble.com.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
12