Lazy, cached supplier: most performant mutex mechanism?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list



De: "concurrency-interest" <[hidden email]>
À: "[hidden email]" <[hidden email]>
Envoyé: Vendredi 31 Juillet 2020 19:40:21
Objet: [concurrency-interest] Lazy, cached supplier: most performant mutex mechanism?
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

For the JVM, you can use the class holder idiom, it uses the fact that class are lazily initialized in Java and that the static block are only run once.

about your comment on synchronized block vs loom, you can use a RentrantLock instead which is loom-aware.


cheers,
Rémi


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
I believe double-checked locking, such as Martin's version in Guava Suppliers#memoize, is the best approach on the JVM for instance-level memoization.

On Fri, Jul 31, 2020 at 10:42 AM Ghadi Shayban via Concurrency-interest <[hidden email]> wrote:
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list
We spent quite some time on this, you can read all about it here: 

On Fri, 31 Jul 2020 at 20:02, Benjamin Manes via Concurrency-interest <[hidden email]> wrote:
I believe double-checked locking, such as Martin's version in Guava Suppliers#memoize, is the best approach on the JVM for instance-level memoization.

On Fri, Jul 31, 2020 at 10:42 AM Ghadi Shayban via Concurrency-interest <[hidden email]> wrote:
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list
Thanks everyone, I'll check these things out.
Seems like the user-space dual of a condy instruction, except run max once. I was hoping to try an approach that wasn't based on synchronized, though synchronized saves you the allocation of a lock.

Ron Pressler had another interesting suggestion to try:

Read val; if it's null, CAS fn to null.

If you win, allocate a lock and do an ordered set (with a VarHandle) to the lock field, lock it, invoke fn, do an ordered write (with a VarHandle) to val, and unlock, then do another ordered write to null out the lock.
If you lose the CAS, spin with an ordered read on the lock field and the val. This will be a short spin, because all you're waiting for is the allocation of the lock. If you see a non-null val, you're done. If you see a non-null lock, lock on it, and then a normal read from val should be non-null.

This requires a bit of care in terms of memory ordering, and will probably require testing on non-Intel platforms, as those have weaker memory ordering than Intel, and many concurrency bugs don't manifest on x86. But the code will still be small, and it's an important method, so a relatively elaborate mechanism there might be worth it.
</suggestion>



On Fri, Jul 31, 2020 at 2:33 PM Viktor Klang <[hidden email]> wrote:
We spent quite some time on this, you can read all about it here: 

On Fri, 31 Jul 2020 at 20:02, Benjamin Manes via Concurrency-interest <[hidden email]> wrote:
I believe double-checked locking, such as Martin's version in Guava Suppliers#memoize, is the best approach on the JVM for instance-level memoization.

On Fri, Jul 31, 2020 at 10:42 AM Ghadi Shayban via Concurrency-interest <[hidden email]> wrote:
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list



De: "concurrency-interest" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Envoyé: Vendredi 31 Juillet 2020 20:56:10
Objet: Re: [concurrency-interest] Lazy, cached supplier: most performant mutex mechanism?
Thanks everyone, I'll check these things out.
Seems like the user-space dual of a condy instruction, except run max once. I was hoping to try an approach that wasn't based on synchronized, though synchronized saves you the allocation of a lock.

There is already a proposal for a JLS dual of condy

but as you said, condy semantics allows the bootstrap method to be run more than once.



Ron Pressler had another interesting suggestion to try:

Read val; if it's null, CAS fn to null.

If you win, allocate a lock and do an ordered set (with a VarHandle) to the lock field, lock it, invoke fn, do an ordered write (with a VarHandle) to val, and unlock, then do another ordered write to null out the lock.
If you lose the CAS, spin with an ordered read on the lock field and the val. This will be a short spin, because all you're waiting for is the allocation of the lock. If you see a non-null val, you're done. If you see a non-null lock, lock on it, and then a normal read from val should be non-null.

This requires a bit of care in terms of memory ordering, and will probably require testing on non-Intel platforms, as those have weaker memory ordering than Intel, and many concurrency bugs don't manifest on x86. But the code will still be small, and it's an important method, so a relatively elaborate mechanism there might be worth it.
</suggestion>


Here is a version that has the nice property to not have a volatile read in the fast-path (and the not so nice property of being very slow if the code is not JITed by c2)

This is how yo use it, here ONCE.get() should always return 0

private static int COUNTER = 0;
private static final Once<Integer> ONCE = Once.of(() -> COUNTER++);

and the trick is to use a MutableCallSite

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.invoke.MutableCallSite;
import java.lang.reflect.UndeclaredThrowableException;
import java.util.Objects;
import java.util.concurrent.locks.ReentrantLock;
import java.util.function.Supplier;

public interface Once<T> extends Supplier<T> {
@SuppressWarnings("unchecked")
static <T> Once<T> of(Supplier<? extends T> supplier) {
var mh = new OnceCallSite(supplier).dynamicInvoker();
return () -> {
try {
return (T) mh.invokeExact();
} catch (RuntimeException | Error e) {
throw e;
} catch(Throwable t) {
throw new UndeclaredThrowableException(t);
}
};
}

class OnceCallSite extends MutableCallSite {
private static final MethodHandle SLOW_PATH;
static {
var lookup = MethodHandles.lookup();
try {
SLOW_PATH = lookup.findVirtual(OnceCallSite.class, "slowPath", MethodType.methodType(Object.class));
} catch (NoSuchMethodException | IllegalAccessException e) {
throw new AssertionError(e);
}
}

private Object value;
private final Supplier<?> supplier;
private final ReentrantLock lock = new ReentrantLock();

private OnceCallSite(Supplier<?> supplier) {
super(MethodType.methodType(Object.class));
this.supplier = supplier;
setTarget(SLOW_PATH.bindTo(this));
}

private Object slowPath() {
lock.lock();
try {
Object value = this.value;
if (value != null) {
return value;
}
value = Objects.requireNonNull(supplier.get());
this.value = value;
setTarget(MethodHandles.constant(Object.class, value));
return value;
} finally {
lock.unlock();
}
}
}
}


On Fri, Jul 31, 2020 at 2:33 PM Viktor Klang <[hidden email]> wrote:
We spent quite some time on this, you can read all about it here: 

On Fri, 31 Jul 2020 at 20:02, Benjamin Manes via Concurrency-interest <[hidden email]> wrote:
I believe double-checked locking, such as Martin's version in Guava Suppliers#memoize, is the best approach on the JVM for instance-level memoization.

On Fri, Jul 31, 2020 at 10:42 AM Ghadi Shayban via Concurrency-interest <[hidden email]> wrote:
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Lazy, cached supplier: most performant mutex mechanism?

JSR166 Concurrency mailing list
Ghadi’s use-case is not a lazy constant but a lazy cons-cell, i.e., a dynamic construct with lots of instances.

— Ron


On 1 August 2020 at 13:25:44, Remi Forax via Concurrency-interest ([hidden email]) wrote:




De: "concurrency-interest" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Envoyé: Vendredi 31 Juillet 2020 20:56:10
Objet: Re: [concurrency-interest] Lazy, cached supplier: most performant mutex mechanism?
Thanks everyone, I'll check these things out.
Seems like the user-space dual of a condy instruction, except run max once. I was hoping to try an approach that wasn't based on synchronized, though synchronized saves you the allocation of a lock.

There is already a proposal for a JLS dual of condy

but as you said, condy semantics allows the bootstrap method to be run more than once.



Ron Pressler had another interesting suggestion to try:

Read val; if it's null, CAS fn to null.

If you win, allocate a lock and do an ordered set (with a VarHandle) to the lock field, lock it, invoke fn, do an ordered write (with a VarHandle) to val, and unlock, then do another ordered write to null out the lock.
If you lose the CAS, spin with an ordered read on the lock field and the val. This will be a short spin, because all you're waiting for is the allocation of the lock. If you see a non-null val, you're done. If you see a non-null lock, lock on it, and then a normal read from val should be non-null.

This requires a bit of care in terms of memory ordering, and will probably require testing on non-Intel platforms, as those have weaker memory ordering than Intel, and many concurrency bugs don't manifest on x86. But the code will still be small, and it's an important method, so a relatively elaborate mechanism there might be worth it.
</suggestion>


Here is a version that has the nice property to not have a volatile read in the fast-path (and the not so nice property of being very slow if the code is not JITed by c2)

This is how yo use it, here ONCE.get() should always return 0

private static int COUNTER = 0;
private static final Once<Integer> ONCE = Once.of(() -> COUNTER++);

and the trick is to use a MutableCallSite

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.invoke.MutableCallSite;
import java.lang.reflect.UndeclaredThrowableException;
import java.util.Objects;
import java.util.concurrent.locks.ReentrantLock;
import java.util.function.Supplier;

public interface Once<T> extends Supplier<T> {
@SuppressWarnings("unchecked")
static <T> Once<T> of(Supplier<? extends T> supplier) {
var mh = new OnceCallSite(supplier).dynamicInvoker();
return () -> {
try {
return (T) mh.invokeExact();
} catch (RuntimeException | Error e) {
throw e;
} catch(Throwable t) {
throw new UndeclaredThrowableException(t);
}
};
}

class OnceCallSite extends MutableCallSite {
private static final MethodHandle SLOW_PATH;
static {
var lookup = MethodHandles.lookup();
try {
SLOW_PATH = lookup.findVirtual(OnceCallSite.class, "slowPath", MethodType.methodType(Object.class));
} catch (NoSuchMethodException | IllegalAccessException e) {
throw new AssertionError(e);
}
}

private Object value;
private final Supplier<?> supplier;
private final ReentrantLock lock = new ReentrantLock();

private OnceCallSite(Supplier<?> supplier) {
super(MethodType.methodType(Object.class));
this.supplier = supplier;
setTarget(SLOW_PATH.bindTo(this));
}

private Object slowPath() {
lock.lock();
try {
Object value = this.value;
if (value != null) {
return value;
}
value = Objects.requireNonNull(supplier.get());
this.value = value;
setTarget(MethodHandles.constant(Object.class, value));
return value;
} finally {
lock.unlock();
}
}
}
}


On Fri, Jul 31, 2020 at 2:33 PM Viktor Klang <[hidden email]> wrote:
We spent quite some time on this, you can read all about it here: 

On Fri, 31 Jul 2020 at 20:02, Benjamin Manes via Concurrency-interest <[hidden email]> wrote:
I believe double-checked locking, such as Martin's version in Guava Suppliers#memoize, is the best approach on the JVM for instance-level memoization.

On Fri, Jul 31, 2020 at 10:42 AM Ghadi Shayban via Concurrency-interest <[hidden email]> wrote:
This seems like it would be a common stdlib ask, but what is the most performant way to protect the code inside a supplier from being concurrently realized more than once? Contention would be rare, and the losing threads need to wait on the value being computed by the winning thread.

The most straightforward thing to do is a synchronized block, but this currently pins a carrier thread in Project Loom.

The Supplier needs to keep track of:
1) the thunk, if unrealized
2) a value, if realized

Golang's sync.Once does this [1] (CAS, fallback to mutex. Doesn't remember value)
Clojure's lazy sequences use synchronized [2]

Is there a better way to approach this on the JVM?

Thanks!



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest