Re: ConcurrentHashMap computeIfAbsent

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Doug Lea
[CCing c-i list]

On 12/18/2014 02:33 PM, Benjamin Manes wrote:

> I'm starting down the long road of writing a JDK8-based cache - effectively a
>  rewrite of Guava's to resolve performance issues and the unbounded memory
> growth problem that occurs in highly concurrent usages. In the process, I
> found out that ConcurrentHashMap's computeIfAbsent is slower then it should
> be and a live-lock failure was reintroduced.
>
>
> The live-lock occurs when a recursive computation is performed for the same
> key. This is a failure that has occurred in practice, as I fixed a similar
> deadlock scenario in Guava when we were triaging bug. When feedback was asked
> for CHMv8 back in 2011, I caught the issue then and it was fixed [1].

The follow-up story on this (at some point briefly mentioned
in some post) is that when we moved to relying on builtin (reentrant)
bin locks without tryLock support, then only a subset of cross-bin
deadlocks/livelocks become detectable (because of reentrancy) and even
then with false-positives. So the most defensible plan is to instead
rely on general-purpose JVM tools/debuggers to help developers with this
along with any other liveness problems possible under arbitrary ifAbsent
functions. On the other hand, we didn't want to rule out all
special-case detection: If an efficient internal tryLock
intrinsic became available, we might do more, because throwing here
helps diagnose unrelated-looking problems in other threads,
and performing checks only when lock is unavailable costs almost nothing.
The main undesirable byproduct is that because the method spec
still mentions the possibility of IllegalStateExceptions, some users
expect them in cases that are not currently handled. (There is an
openJDK bug report on this (JDK-8062841) that I'm not sure what
to do about.)

> Performance-wise, computeIfAbsent pessimistically locks instead of
> optimistically trying to return an existing entry.

There are lots of tradeoffs. With the current implementation,
if you are implementing a cache, it may be better to code cache.get
to itself do a pre-screen, as in:
   V v = map.get(key);
   return (v != null) ? v : map.computeIfAbsent(key, function);

However, the exact benefit depends on access patterns.
For example, I reran your benchmark cases (urls below) on a
32way x86, and got throughputs (ops/sec) that are dramatically
better with pre-screen for the case of a single key,
but worse with your Zipf-distributed keys. As an intermediate
test, I measured the impact of adding a single-node prescreen
("1cif") before locking inside CHM.computeIfAbsent, that is similar
to what was done in some pre-release versions:

Same key

cif:        1402559
get+cif: 3775886700
1cif:    1760916148

Zipf-distributed keys

cif:     1414945003
get+cif:  882477874
1cif:     618668961

One might think (I did until running similar experiments)
that the "1cif" version would be the best compromise.
But currently it isn't.
This is in part due to interactions with biased locking,
that in some cases basically provide "free" prescreens, but
in other cases add safepoint/GC pressure in addition
to lock contention. This is all up for re-consideration though.

-Doug


> I did this (2008) in a JCiP-style map of futures, later Guava did the same on
> its CHM fork, and even Java does this in CSLM and ConcurrentMap's default
> method. When the entry's lock is not contended the impact is minor, but hot
> entries in a cache suffer unnecessarily. The results of a JMH benchmark [3]
> below shows a 10x gain when adding this check (Caffeine). Admittedly the
> impact is minor for applications as Guava's performance is not much better,
> but that is why infrastructure projects still use ConcurrentLinkedHashMap for
> performance sensitive code even though Guava was intended to be its
> successor.
>
>
> [1]
> http://cs.oswego.edu/pipermail/concurrency-interest/2011-August/008188.html
>
> [2]
> https://github.com/ben-manes/caffeine/blob/master/src/test/java/com/github/benmanes/caffeine/cache/ComputingTest.java
>
>  [3]
> https://github.com/ben-manes/caffeine/blob/master/src/jmh/java/com/github/benmanes/caffeine/cache/ComputeBenchmark.java
>
>
>
> Benchmark                                       (computingType)   Mode
> Samples Score          Error  Units
>
> c.g.b.c.c.ComputeBenchmark.compute_sameKey    ConcurrentHashMap  thrpt
> 10 17729056.323 ±   557476.404  ops/s
>
> c.g.b.c.c.ComputeBenchmark.compute_sameKey             Caffeine  thrpt
> 10 347007236.316 ± 24370724.293  ops/s
>
> c.g.b.c.c.ComputeBenchmark.compute_sameKey                Guava  thrpt
> 10 29712031.905 ±   272916.744  ops/s
>
> c.g.b.c.c.ComputeBenchmark.compute_spread     ConcurrentHashMap  thrpt
> 10 104565034.688 ±  4207350.038  ops/s
>
> c.g.b.c.c.ComputeBenchmark.compute_spread              Caffeine  thrpt
> 10 132953599.579 ± 13705263.521  ops/s
>
> c.g.b.c.c.ComputeBenchmark.compute_spread                 Guava  thrpt
> 10 61794001.850 ±  1864056.437  ops/s
>
>



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Benjamin Manes
If an efficient internal tryLock intrinsic became available, we might do more, because throwing here helps diagnose unrelated-looking problems in other threads, and performing checks only when lock is unavailable costs almost nothing.

Is the tryLock intrinsic necessary to be performant and safe? I tried using Thread.holdsLock() and it halved the performance. I then tried stashing the thread's id into an extra field only stored on ReservationNode and saw equal performance on my 4-core/8HT macbook. Does this perform much worse on your 32-way machine? (See ConcurrentHashMap2 in repository).

Benchmark                                        (computingType)   Mode  Samples          Score         Error  Units
c.g.b.c.c.ComputeBenchmark.compute_sameKey     ConcurrentHashMap  thrpt       10   18324025.789 ± 1030272.256  ops/s
c.g.b.c.c.ComputeBenchmark.compute_sameKey    ConcurrentHashMap2  thrpt       10   19427762.264 ±  393697.377  ops/s
c.g.b.c.c.ComputeBenchmark.compute_spread      ConcurrentHashMap  thrpt       10  102832301.745 ± 4022095.932  ops/s
c.g.b.c.c.ComputeBenchmark.compute_spread     ConcurrentHashMap2  thrpt       10  103014534.166 ± 3176520.706  ops/s


On Sat, Dec 20, 2014 at 7:24 AM, Doug Lea <[hidden email]> wrote:
[CCing c-i list]

On 12/18/2014 02:33 PM, Benjamin Manes wrote:
I'm starting down the long road of writing a JDK8-based cache - effectively a
 rewrite of Guava's to resolve performance issues and the unbounded memory
growth problem that occurs in highly concurrent usages. In the process, I
found out that ConcurrentHashMap's computeIfAbsent is slower then it should
be and a live-lock failure was reintroduced.


The live-lock occurs when a recursive computation is performed for the same
key. This is a failure that has occurred in practice, as I fixed a similar
deadlock scenario in Guava when we were triaging bug. When feedback was asked
for CHMv8 back in 2011, I caught the issue then and it was fixed [1].

The follow-up story on this (at some point briefly mentioned
in some post) is that when we moved to relying on builtin (reentrant)
bin locks without tryLock support, then only a subset of cross-bin
deadlocks/livelocks become detectable (because of reentrancy) and even
then with false-positives. So the most defensible plan is to instead
rely on general-purpose JVM tools/debuggers to help developers with this
along with any other liveness problems possible under arbitrary ifAbsent
functions. On the other hand, we didn't want to rule out all
special-case detection: If an efficient internal tryLock
intrinsic became available, we might do more, because throwing here
helps diagnose unrelated-looking problems in other threads,
and performing checks only when lock is unavailable costs almost nothing.
The main undesirable byproduct is that because the method spec
still mentions the possibility of IllegalStateExceptions, some users
expect them in cases that are not currently handled. (There is an
openJDK bug report on this (JDK-8062841) that I'm not sure what
to do about.)

Performance-wise, computeIfAbsent pessimistically locks instead of
optimistically trying to return an existing entry.

There are lots of tradeoffs. With the current implementation,
if you are implementing a cache, it may be better to code cache.get
to itself do a pre-screen, as in:
  V v = map.get(key);
  return (v != null) ? v : map.computeIfAbsent(key, function);

However, the exact benefit depends on access patterns.
For example, I reran your benchmark cases (urls below) on a
32way x86, and got throughputs (ops/sec) that are dramatically
better with pre-screen for the case of a single key,
but worse with your Zipf-distributed keys. As an intermediate
test, I measured the impact of adding a single-node prescreen
("1cif") before locking inside CHM.computeIfAbsent, that is similar
to what was done in some pre-release versions:

Same key

cif:        1402559
get+cif: 3775886700
1cif:    1760916148

Zipf-distributed keys

cif:     1414945003
get+cif:  882477874
1cif:     618668961

One might think (I did until running similar experiments)
that the "1cif" version would be the best compromise.
But currently it isn't.
This is in part due to interactions with biased locking,
that in some cases basically provide "free" prescreens, but
in other cases add safepoint/GC pressure in addition
to lock contention. This is all up for re-consideration though.

-Doug



I did this (2008) in a JCiP-style map of futures, later Guava did the same on
its CHM fork, and even Java does this in CSLM and ConcurrentMap's default
method. When the entry's lock is not contended the impact is minor, but hot
entries in a cache suffer unnecessarily. The results of a JMH benchmark [3]
below shows a 10x gain when adding this check (Caffeine). Admittedly the
impact is minor for applications as Guava's performance is not much better,
but that is why infrastructure projects still use ConcurrentLinkedHashMap for
performance sensitive code even though Guava was intended to be its
successor.


[1]
http://cs.oswego.edu/pipermail/concurrency-interest/2011-August/008188.html

[2]
https://github.com/ben-manes/caffeine/blob/master/src/test/java/com/github/benmanes/caffeine/cache/ComputingTest.java

 [3]
https://github.com/ben-manes/caffeine/blob/master/src/jmh/java/com/github/benmanes/caffeine/cache/ComputeBenchmark.java



Benchmark                                       (computingType)   Mode
Samples Score          Error  Units

c.g.b.c.c.ComputeBenchmark.compute_sameKey    ConcurrentHashMap  thrpt
10 <a href="tel:17729056.323" value="+17729056323" target="_blank">17729056.323 ±   557476.404  ops/s

c.g.b.c.c.ComputeBenchmark.compute_sameKey             Caffeine  thrpt
10 347007236.316 ± 24370724.293  ops/s

c.g.b.c.c.ComputeBenchmark.compute_sameKey                Guava  thrpt
10 29712031.905 ±   272916.744  ops/s

c.g.b.c.c.ComputeBenchmark.compute_spread     ConcurrentHashMap  thrpt
10 104565034.688 ±  4207350.038  ops/s

c.g.b.c.c.ComputeBenchmark.compute_spread              Caffeine  thrpt
10 132953599.579 ± 13705263.521  ops/s

c.g.b.c.c.ComputeBenchmark.compute_spread                 Guava  thrpt
10 61794001.850 ±  1864056.437  ops/s






_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Doug Lea
On 12/20/2014 04:11 PM, Benjamin Manes wrote:

>     If an efficient internal tryLock intrinsic became available, we might do
>     more, because throwing here helps diagnose unrelated-looking problems in
>     other threads, and performing checks only when lock is unavailable costs
>     almost nothing.
>
>
> Is the tryLock intrinsic necessary to be performant and safe? I tried using
> Thread.holdsLock() and it halved the performance. I then tried stashing the
> thread's id into an extra field only stored on ReservationNode and saw equal
> performance on my 4-core/8HT macbook.

This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode. Also,
it doesn't alone deal with interactions with other
lambda-accepting methods.

Here is some background context in case others have any
thoughts about improving support:

By popular demand, CHM strengthens ConcurrentMap.computeIfAbsent
specs (similarly for computeIfPresent etc) to guarantee atomicity
while the function is executed. This requires that we hold a lock
(for the bin in which the entry will be placed) during the call
to the supplied function. We tell users not to use functions
that have any other potential side effects on the map,
but we cannot enforce it. Holding a lock adds to the possible
bad outcomes of violating the side-effect-freedom requirement.
We'd like to be helpful in diagnosing violations, because
these outcomes can be very mysterious-looking.

CHM itself cannot directly help with some potential
lock-based errors. For example, if the function for
computeIfAbsent(key1) can invoke computeIfAbsent(key2)
and vice versa, and the keys hash to different bins, then they
can encounter a classic deadlock. But this and related cases
can be diagnosed using existing JVM cycle-detection tools.

The nature of other potential lock-based errors depends
on the kind of lock used. Some preliminary versions of
jdk8 CHM used non-reentrant locks. (It now uses reentrant
locks.)

With non-reentrant locks, cases of infinite recursion
(i.e., m.computeIfAbsent(k,f) somehow calling itself) result in
lockups rather than infinite loops. Similarly for mutually
recursive computeIfAbsent calls for key1 and key2 that happen
to hash into the same bin. However these cases can be detected
with very low performance impact (by doing so only if a tryLock
fails). Which led us to add CHM method throws specs to allow
this. Even though jdk8 CHM now uses reentrant locks, we want to
keep open the ability to use potentially faster non-reentrant
approaches in the future.

With reentrant locks, infinite computeIfAbsent recursions
on the same key are not all that different than any other
case of infinite recursion. This is "detectable" only by
throwing after the number of calls exceeds a threshold.
But the threshold can arguably be set to 1 because of the
no-side-effects rule, and one special case of this check
(as above, via thread IDs in ReservationNodes) can be done
with only small impact. But it's not clear to me whether
this is worthwhile by itself.

The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.
(This appears to be the underlying undetected user bug in
JDK-8062841.) It seems that this is beyond our current ability
to diagnose, but suggestions would be welcome.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Viktor Klang
For "computeIfAbsent"-style maps which are intended for multithreaded use, I tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]], this means that writers are not blocked and readers aren't blocked unless they choose to block on the returned CompletionStage. This also has the added benefit of making it easy to designate where the computation should be executed.

On Sun, Dec 21, 2014 at 3:20 PM, Doug Lea <[hidden email]> wrote:
On 12/20/2014 04:11 PM, Benjamin Manes wrote:
    If an efficient internal tryLock intrinsic became available, we might do
    more, because throwing here helps diagnose unrelated-looking problems in
    other threads, and performing checks only when lock is unavailable costs
    almost nothing.


Is the tryLock intrinsic necessary to be performant and safe? I tried using
Thread.holdsLock() and it halved the performance. I then tried stashing the
thread's id into an extra field only stored on ReservationNode and saw equal
performance on my 4-core/8HT macbook.

This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode. Also,
it doesn't alone deal with interactions with other
lambda-accepting methods.

Here is some background context in case others have any
thoughts about improving support:

By popular demand, CHM strengthens ConcurrentMap.computeIfAbsent
specs (similarly for computeIfPresent etc) to guarantee atomicity
while the function is executed. This requires that we hold a lock
(for the bin in which the entry will be placed) during the call
to the supplied function. We tell users not to use functions
that have any other potential side effects on the map,
but we cannot enforce it. Holding a lock adds to the possible
bad outcomes of violating the side-effect-freedom requirement.
We'd like to be helpful in diagnosing violations, because
these outcomes can be very mysterious-looking.

CHM itself cannot directly help with some potential
lock-based errors. For example, if the function for
computeIfAbsent(key1) can invoke computeIfAbsent(key2)
and vice versa, and the keys hash to different bins, then they
can encounter a classic deadlock. But this and related cases
can be diagnosed using existing JVM cycle-detection tools.

The nature of other potential lock-based errors depends
on the kind of lock used. Some preliminary versions of
jdk8 CHM used non-reentrant locks. (It now uses reentrant
locks.)

With non-reentrant locks, cases of infinite recursion
(i.e., m.computeIfAbsent(k,f) somehow calling itself) result in
lockups rather than infinite loops. Similarly for mutually
recursive computeIfAbsent calls for key1 and key2 that happen
to hash into the same bin. However these cases can be detected
with very low performance impact (by doing so only if a tryLock
fails). Which led us to add CHM method throws specs to allow
this. Even though jdk8 CHM now uses reentrant locks, we want to
keep open the ability to use potentially faster non-reentrant
approaches in the future.

With reentrant locks, infinite computeIfAbsent recursions
on the same key are not all that different than any other
case of infinite recursion. This is "detectable" only by
throwing after the number of calls exceeds a threshold.
But the threshold can arguably be set to 1 because of the
no-side-effects rule, and one special case of this check
(as above, via thread IDs in ReservationNodes) can be done
with only small impact. But it's not clear to me whether
this is worthwhile by itself.

The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.
(This appears to be the underlying undetected user bug in
JDK-8062841.) It seems that this is beyond our current ability
to diagnose, but suggestions would be welcome.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Doug Lea
On 12/21/2014 09:59 AM, Viktor Klang wrote:
> For "computeIfAbsent"-style maps which are intended for multithreaded use, I
> tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Viktor Klang


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Peter Levart
In reply to this post by Doug Lea
Hi,

If we wanted to detect any re-entrance to the whole CHM by the same thread, then the state would have to be placed outside the lock object (outside the head of the bucket). Conceptually it would have to be a ThreadLocal<Boolean> instance (per CHM instance).

As I understand, the problematic re-entrances are those that re-enter synchronized blocks. All other code (out of synchronized blocks) is safe to reenter, since it is safe to execute by multiple threads. Re-entering synchronized block causes most hard to diagnose effects (what's left are only deadlocks). If locks were non-reentrant and threw IllegalStateException on trying to re-enter, it would be fine.

One way to get such locks is to check holdsLock() before locking:

if (Thread.holdsLock(lock)) throw new IllegalStateException();
synchronized (lock) {
...
}

... but measuring on my i7 Linux PC shows that holdsLock() amounts to an addition of 2/3 the overhead of non-contended locking.

If we added an int field to Node class, we would extend it for 4 bytes on 32 bit JVM, but on 64 bit JVM it would be packed together with hash field and Node footprint would not increase. Current sizes of Nodes on 32 bit JVM are: (Node/ReservationNode: 28 bytes, TreeNode: 48 bytes, TreeBin: 44 bytes, ForwardingNode: 32 bytes). Given that plain Nodes are in majority, this would amount in approx. 14% increase of heap usage for nodes on 32 bit JVMs, but no increase on 64 bit JVMs.

If this is a no-go, then I'm out of ideas short of checking whether holdsLock() could be made any faster. Perhaps it just needs to be intrinsified (if it's not already)? If Node increase is acceptable then read further...

Since thread id is a long value, packing it into an int would give us just a kind of hash. But good-enough hash so that additional checking for holdsLock() would be needed from purely theoretical perspective.

Here's to illustrate what I mean:

    http://cr.openjdk.java.net/~plevart/misc/CHM.nonreentrantLocks/webrev.01/

Changes I did are purely mechanical. I inserted node.checkNotEntered() at start of each synchronized block that doesn't contain a call-back to user code:

synchronized (node) {
    node.checkNotEntered();
    // body that doesn't call-back user code
}

... and wrapped the body of synchronized blocks that call-back user code into:

synchronized (node) {
    node.enter();
    try {
        // body that calls-back user code
    } finally {
        node.exit();
    }
}


What do you think? Is space/diagnosability trade-of worth it?

Regards, Peter


On 12/21/2014 03:20 PM, Doug Lea wrote:
On 12/20/2014 04:11 PM, Benjamin Manes wrote:
    If an efficient internal tryLock intrinsic became available, we might do
    more, because throwing here helps diagnose unrelated-looking problems in
    other threads, and performing checks only when lock is unavailable costs
    almost nothing.


Is the tryLock intrinsic necessary to be performant and safe? I tried using
Thread.holdsLock() and it halved the performance. I then tried stashing the
thread's id into an extra field only stored on ReservationNode and saw equal
performance on my 4-core/8HT macbook.

This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode. Also,
it doesn't alone deal with interactions with other
lambda-accepting methods.

Here is some background context in case others have any
thoughts about improving support:

By popular demand, CHM strengthens ConcurrentMap.computeIfAbsent
specs (similarly for computeIfPresent etc) to guarantee atomicity
while the function is executed. This requires that we hold a lock
(for the bin in which the entry will be placed) during the call
to the supplied function. We tell users not to use functions
that have any other potential side effects on the map,
but we cannot enforce it. Holding a lock adds to the possible
bad outcomes of violating the side-effect-freedom requirement.
We'd like to be helpful in diagnosing violations, because
these outcomes can be very mysterious-looking.

CHM itself cannot directly help with some potential
lock-based errors. For example, if the function for
computeIfAbsent(key1) can invoke computeIfAbsent(key2)
and vice versa, and the keys hash to different bins, then they
can encounter a classic deadlock. But this and related cases
can be diagnosed using existing JVM cycle-detection tools.

The nature of other potential lock-based errors depends
on the kind of lock used. Some preliminary versions of
jdk8 CHM used non-reentrant locks. (It now uses reentrant
locks.)

With non-reentrant locks, cases of infinite recursion
(i.e., m.computeIfAbsent(k,f) somehow calling itself) result in
lockups rather than infinite loops. Similarly for mutually
recursive computeIfAbsent calls for key1 and key2 that happen
to hash into the same bin. However these cases can be detected
with very low performance impact (by doing so only if a tryLock
fails). Which led us to add CHM method throws specs to allow
this. Even though jdk8 CHM now uses reentrant locks, we want to
keep open the ability to use potentially faster non-reentrant
approaches in the future.

With reentrant locks, infinite computeIfAbsent recursions
on the same key are not all that different than any other
case of infinite recursion. This is "detectable" only by
throwing after the number of calls exceeds a threshold.
But the threshold can arguably be set to 1 because of the
no-side-effects rule, and one special case of this check
(as above, via thread IDs in ReservationNodes) can be done
with only small impact. But it's not clear to me whether
this is worthwhile by itself.

The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.
(This appears to be the underlying undetected user bug in
JDK-8062841.) It seems that this is beyond our current ability
to diagnose, but suggestions would be welcome.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Benjamin Manes
In reply to this post by Viktor Klang
This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode.
 
I'm not comfortable enough with the code to experiment, but reading it I keep wondering if inserting a node prior to computation would provide detection. The value would be null, to be filled in if the computation succeeds. If a reentrant computation occurs then when the existing node is found and we observe that the value is unset, then we have detected a cycle since we already hold the necessary lock. This would require skipping those nodes for methods that traverse the entire map, like iterators and size().

Internally, this would make computations feel more like the JCiP-style future based approach Viktor mentioned and allow for supporting batch computations. It has the nice property of not wasting more memory with values wrapped in futures, unless an asynchronous cache was explicitly requested. That can be built on-top of CHM, as was always possible, and an API option that I plan on providing.

The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.

If changes like the above removed the live lock issue, then I'd be much happier with a resulting deadlock remain unsolved. That's much easier for developers to notice with jdk tooling when debugging server issues and recognize the problem as their fault.

On Sun, Dec 21, 2014 at 10:07 AM, Viktor Klang <[hidden email]> wrote:


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Peter Levart
In reply to this post by Peter Levart

On 12/21/2014 08:33 PM, Peter Levart wrote:
Here's to illustrate what I mean:

    http://cr.openjdk.java.net/~plevart/misc/CHM.nonreentrantLocks/webrev.01/

Changes I did are purely mechanical.


Sorry, changes I did to code at last moment were not correct. This is what I had before and was correct:

    http://cr.openjdk.java.net/~plevart/misc/CHM.nonreentrantLocks/webrev.02/

The node.checkNotEntered() must be called before synchronized block, since it contains a check for Thread.holdsLock(). node.enter() does not contain a check, so checkNotEntered() must be called before synchronized blocks that call-back user code too (where necessary).


Regards, Peter


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Peter Levart
In reply to this post by Benjamin Manes
Hi Benjamin,

Doug is certainly the one to give you the best explanation, but let me try...

On 12/21/2014 08:36 PM, Benjamin Manes wrote:
This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode.
 
I'm not comfortable enough with the code to experiment, but reading it I keep wondering if inserting a node prior to computation would provide detection. The value would be null, to be filled in if the computation succeeds. If a reentrant computation occurs then when the existing node is found and we observe that the value is unset, then we have detected a cycle since we already hold the necessary lock. This would require skipping those nodes for methods that traverse the entire map, like iterators and size().

...ReservationNode is such node. But is only inserted (with CAS) as 1st node of the bucket when the bucket is still empty. This is to provide the object to use as the lock when changing anything in the bucket. It is replaced with normal Node when the function's result is obtained. Logically it is skipped on look-ups and traversals. If a node is inserted into the bucket that already contains nodes, no ReservationNode is used.

And it's not only inserting (computeIfAbsent) that calls-back user code. See also compute() method. This method overwrites the old value with new value returned from function in existing node or even removes a Node from the bucket (if function returns null). How would you detect that such process is taking place when you re-enter?

I thought about it for some time, but didn't find any existing state combination that could be used for such checks. Perhaps a ReservationNode could be inserted just after 1st node in bucket temporarily just to mark the fact that we entered synchronization block and later remove it before exiting the block. This would most probably work, but would also mean that we produce one object of garbage for each such call. Maybe this is better than 4 bytes of state overhead for each node (see my experiment).


Internally, this would make computations feel more like the JCiP-style future based approach Viktor mentioned and allow for supporting batch computations. It has the nice property of not wasting more memory with values wrapped in futures, unless an asynchronous cache was explicitly requested. That can be built on-top of CHM, as was always possible, and an API option that I plan on providing.

Don't forget that CHM provides a lock-free lookup. We would not like to loose this feature. Lazy evaluation (or waiting on computation) can be added on top of CHM if one needs it.

Regards, Peter


The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.

If changes like the above removed the live lock issue, then I'd be much happier with a resulting deadlock remain unsolved. That's much easier for developers to notice with jdk tooling when debugging server issues and recognize the problem as their fault.

On Sun, Dec 21, 2014 at 10:07 AM, Viktor Klang <[hidden email]> wrote:


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Benjamin Manes
Thanks Peter,

I don't think compute() changes my suggestion. If the bucket count is larger than one, then I was suggesting inserting a regular Node with a null value that is updated when the mapping function completes successfully. The node is scavenged when the computation fails, but since we know the instance that extra scan should be cheap. By updating a regular node I had hoped one could avoid garbage in the common case. 

Experimenting to determine the feasibility of this type of change is too invasive for me to feel comfortable trying given the interactions with all the other map operations that I may not fully consider at the moment. Since ReservationNode is a special type, it seems likely that there are scenarios that I'm not considering.

On Sun, Dec 21, 2014 at 12:53 PM, Peter Levart <[hidden email]> wrote:
Hi Benjamin,

Doug is certainly the one to give you the best explanation, but let me try...

On 12/21/2014 08:36 PM, Benjamin Manes wrote:
This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode.
 
I'm not comfortable enough with the code to experiment, but reading it I keep wondering if inserting a node prior to computation would provide detection. The value would be null, to be filled in if the computation succeeds. If a reentrant computation occurs then when the existing node is found and we observe that the value is unset, then we have detected a cycle since we already hold the necessary lock. This would require skipping those nodes for methods that traverse the entire map, like iterators and size().

...ReservationNode is such node. But is only inserted (with CAS) as 1st node of the bucket when the bucket is still empty. This is to provide the object to use as the lock when changing anything in the bucket. It is replaced with normal Node when the function's result is obtained. Logically it is skipped on look-ups and traversals. If a node is inserted into the bucket that already contains nodes, no ReservationNode is used.

And it's not only inserting (computeIfAbsent) that calls-back user code. See also compute() method. This method overwrites the old value with new value returned from function in existing node or even removes a Node from the bucket (if function returns null). How would you detect that such process is taking place when you re-enter?

I thought about it for some time, but didn't find any existing state combination that could be used for such checks. Perhaps a ReservationNode could be inserted just after 1st node in bucket temporarily just to mark the fact that we entered synchronization block and later remove it before exiting the block. This would most probably work, but would also mean that we produce one object of garbage for each such call. Maybe this is better than 4 bytes of state overhead for each node (see my experiment).


Internally, this would make computations feel more like the JCiP-style future based approach Viktor mentioned and allow for supporting batch computations. It has the nice property of not wasting more memory with values wrapped in futures, unless an asynchronous cache was explicitly requested. That can be built on-top of CHM, as was always possible, and an API option that I plan on providing.

Don't forget that CHM provides a lock-free lookup. We would not like to loose this feature. Lazy evaluation (or waiting on computation) can be added on top of CHM if one needs it.

Regards, Peter


The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.

If changes like the above removed the live lock issue, then I'd be much happier with a resulting deadlock remain unsolved. That's much easier for developers to notice with jdk tooling when debugging server issues and recognize the problem as their fault.

On Sun, Dec 21, 2014 at 10:07 AM, Viktor Klang <[hidden email]> wrote:


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Doug Lea
In reply to this post by Peter Levart
On 12/21/2014 02:33 PM, Peter Levart wrote:

> ... but measuring on my i7 Linux PC shows that holdsLock() amounts to an
> addition of 2/3 the overhead of non-contended locking.
>
> If we added an int field to Node class...

... then we'd probably do a lot more with it than this :-)

Among the requests/goals for jdk8 CHM was to reduce, not increase,
footprint compared to previous versions. Introducing space overhead
for the sake of small improvements in helping to diagnose uncommon
user errors is not a very defensible tradeoff.

But there may be some ways to more narrowly address the actual
user error scenarios; i.e., for m.computeIfAbsent(k1, f) where
function f calls m.computeIfAbsent(k2, ...),
across three cases: (1) k1 != k2 and they reside in different bins,
or (2) k1 != k2 but they reside in the same bin, or (3) k1 == k2.
(Further mixtures with computeIfPresent, compute, and/or merge
don't seem to change the basic problem.)

We should write off coping with deadlock and infinite recursion. But
we can still add some (cheap) integrity checks that can only be
violated in the remaining user-error scenarios. In particular
the one below, which will only trigger after-the-fact, but
still effective. I'll explore others too.

*** ConcurrentHashMap.java.~1.258.~ 2014-12-20 11:16:27.835140276 -0500
--- ConcurrentHashMap.java 2014-12-21 17:23:43.233685637 -0500
***************
*** 1633,1638 ****
--- 1633,1639 ----
                           } finally {
                               setTabAt(tab, i, node);
                           }
+                         if (r.next != null) throw new IllegalStateException();
                       }
                   }
                   if (binCount != 0)

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Peter Levart
In reply to this post by Benjamin Manes

On 12/21/2014 10:20 PM, Benjamin Manes wrote:
Thanks Peter,

I don't think compute() changes my suggestion. If the bucket count is larger than one, then I was suggesting inserting a regular Node with a null value that is updated when the mapping function completes successfully.

You mean "a Node that is *removed*" when computation ends. Yes, as I said, that is a possibility. A "marker" node inserted just after 1st node (or before 1st node?) in bucket and later removed. This would mean three additional volatile writes though (next field is volatile). Not to mention what to do when nodes are forming a tree (yes, that's possible).

The node is scavenged when the computation fails, but since we know the instance that extra scan should be cheap. By updating a regular node I had hoped one could avoid garbage in the common case.

If common case was inserting, then perhaps, yes. But currently the skipping detection is designed by checking the hash field which is final.



Experimenting to determine the feasibility of this type of change is too invasive for me to feel comfortable trying given the interactions with all the other map operations that I may not fully consider at the moment. Since ReservationNode is a special type, it seems likely that there are scenarios that I'm not considering.

A change using "marker" nodes would be very invasive. Each operation itself traverses the node chain or tree and either inserts, modifies or removes a node (possibly the 1st node of the bucket). Adding insertion of "marker" node and later removal would complicate the traversal/insertion/removal logic of each individual operation with user call-backs in it's own unique way. I don't know if such change is warranted just to get better diagnostics.

Regards, Peter


On Sun, Dec 21, 2014 at 12:53 PM, Peter Levart <[hidden email]> wrote:
Hi Benjamin,

Doug is certainly the one to give you the best explanation, but let me try...

On 12/21/2014 08:36 PM, Benjamin Manes wrote:
This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode.
 
I'm not comfortable enough with the code to experiment, but reading it I keep wondering if inserting a node prior to computation would provide detection. The value would be null, to be filled in if the computation succeeds. If a reentrant computation occurs then when the existing node is found and we observe that the value is unset, then we have detected a cycle since we already hold the necessary lock. This would require skipping those nodes for methods that traverse the entire map, like iterators and size().

...ReservationNode is such node. But is only inserted (with CAS) as 1st node of the bucket when the bucket is still empty. This is to provide the object to use as the lock when changing anything in the bucket. It is replaced with normal Node when the function's result is obtained. Logically it is skipped on look-ups and traversals. If a node is inserted into the bucket that already contains nodes, no ReservationNode is used.

And it's not only inserting (computeIfAbsent) that calls-back user code. See also compute() method. This method overwrites the old value with new value returned from function in existing node or even removes a Node from the bucket (if function returns null). How would you detect that such process is taking place when you re-enter?

I thought about it for some time, but didn't find any existing state combination that could be used for such checks. Perhaps a ReservationNode could be inserted just after 1st node in bucket temporarily just to mark the fact that we entered synchronization block and later remove it before exiting the block. This would most probably work, but would also mean that we produce one object of garbage for each such call. Maybe this is better than 4 bytes of state overhead for each node (see my experiment).


Internally, this would make computations feel more like the JCiP-style future based approach Viktor mentioned and allow for supporting batch computations. It has the nice property of not wasting more memory with values wrapped in futures, unless an asynchronous cache was explicitly requested. That can be built on-top of CHM, as was always possible, and an API option that I plan on providing.

Don't forget that CHM provides a lock-free lookup. We would not like to loose this feature. Lazy evaluation (or waiting on computation) can be added on top of CHM if one needs it.

Regards, Peter


The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.

If changes like the above removed the live lock issue, then I'd be much happier with a resulting deadlock remain unsolved. That's much easier for developers to notice with jdk tooling when debugging server issues and recognize the problem as their fault.

On Sun, Dec 21, 2014 at 10:07 AM, Viktor Klang <[hidden email]> wrote:


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Benjamin Manes
A "marker" node inserted just after 1st node (or before 1st node?) in bucket and later removed. This would mean three additional volatile writes though (next field is volatile). Not to mention what to do when nodes are forming a tree (yes, that's possible).

Yes, but since writes are done in a synchronized block, the extra writes could piggyback their visibility on exiting that block.

But currently the skipping detection is designed by checking the hash field which is final.

Yes, and so this would incur an additional volatile read during writes only if the hash and key checks passed, and a read per value node on full traversal operations. The common case of lock free per-entry reads would be unaffected.

On Sun, Dec 21, 2014 at 2:48 PM, Peter Levart <[hidden email]> wrote:

On 12/21/2014 10:20 PM, Benjamin Manes wrote:
Thanks Peter,

I don't think compute() changes my suggestion. If the bucket count is larger than one, then I was suggesting inserting a regular Node with a null value that is updated when the mapping function completes successfully.

You mean "a Node that is *removed*" when computation ends. Yes, as I said, that is a possibility. A "marker" node inserted just after 1st node (or before 1st node?) in bucket and later removed. This would mean three additional volatile writes though (next field is volatile). Not to mention what to do when nodes are forming a tree (yes, that's possible).

The node is scavenged when the computation fails, but since we know the instance that extra scan should be cheap. By updating a regular node I had hoped one could avoid garbage in the common case.

If common case was inserting, then perhaps, yes. But currently the skipping detection is designed by checking the hash field which is final.



Experimenting to determine the feasibility of this type of change is too invasive for me to feel comfortable trying given the interactions with all the other map operations that I may not fully consider at the moment. Since ReservationNode is a special type, it seems likely that there are scenarios that I'm not considering.

A change using "marker" nodes would be very invasive. Each operation itself traverses the node chain or tree and either inserts, modifies or removes a node (possibly the 1st node of the bucket). Adding insertion of "marker" node and later removal would complicate the traversal/insertion/removal logic of each individual operation with user call-backs in it's own unique way. I don't know if such change is warranted just to get better diagnostics.

Regards, Peter


On Sun, Dec 21, 2014 at 12:53 PM, Peter Levart <[hidden email]> wrote:
Hi Benjamin,

Doug is certainly the one to give you the best explanation, but let me try...

On 12/21/2014 08:36 PM, Benjamin Manes wrote:
This is worth considering, but it only detects one form of
misbehavior when the bin doesn't hold any other nodes.
Otherwise, the first node need not be a ReservationNode.
 
I'm not comfortable enough with the code to experiment, but reading it I keep wondering if inserting a node prior to computation would provide detection. The value would be null, to be filled in if the computation succeeds. If a reentrant computation occurs then when the existing node is found and we observe that the value is unset, then we have detected a cycle since we already hold the necessary lock. This would require skipping those nodes for methods that traverse the entire map, like iterators and size().

...ReservationNode is such node. But is only inserted (with CAS) as 1st node of the bucket when the bucket is still empty. This is to provide the object to use as the lock when changing anything in the bucket. It is replaced with normal Node when the function's result is obtained. Logically it is skipped on look-ups and traversals. If a node is inserted into the bucket that already contains nodes, no ReservationNode is used.

And it's not only inserting (computeIfAbsent) that calls-back user code. See also compute() method. This method overwrites the old value with new value returned from function in existing node or even removes a Node from the bucket (if function returns null). How would you detect that such process is taking place when you re-enter?

I thought about it for some time, but didn't find any existing state combination that could be used for such checks. Perhaps a ReservationNode could be inserted just after 1st node in bucket temporarily just to mark the fact that we entered synchronization block and later remove it before exiting the block. This would most probably work, but would also mean that we produce one object of garbage for each such call. Maybe this is better than 4 bytes of state overhead for each node (see my experiment).


Internally, this would make computations feel more like the JCiP-style future based approach Viktor mentioned and allow for supporting batch computations. It has the nice property of not wasting more memory with values wrapped in futures, unless an asynchronous cache was explicitly requested. That can be built on-top of CHM, as was always possible, and an API option that I plan on providing.

Don't forget that CHM provides a lock-free lookup. We would not like to loose this feature. Lazy evaluation (or waiting on computation) can be added on top of CHM if one needs it.

Regards, Peter


The remaining mysterious cases are nested computeIfAbsent
calls for different keys that happen to hash to the same bin.
These conceptually ought to deadlock. Instead they will
sometimes happen to "work" but other times loop.

If changes like the above removed the live lock issue, then I'd be much happier with a resulting deadlock remain unsolved. That's much easier for developers to notice with jdk tooling when debugging server issues and recognize the problem as their fault.

On Sun, Dec 21, 2014 at 10:07 AM, Viktor Klang <[hidden email]> wrote:


On Sun, Dec 21, 2014 at 4:47 PM, Doug Lea <[hidden email]> wrote:
On 12/21/2014 09:59 AM, Viktor Klang wrote:
For "computeIfAbsent"-style maps which are intended for multithreaded use, I
tend to prefer to use ConcurrentMap[Key, CompletionStage[Value]],

Right. Prior to jdk8, we suggested variations of this and other
techniques that remain preferable in many cases. However, the
most common CHM user error was trying (but failing) to emulate
what is now supported as computeIfAbsent. This is an improvement,
but leads to new (much less common) kinds of potential errors.
We cannot automatically eliminate or trap all of them. It
might be possible to do more than we do now, either internally
(inside CHM) or, more likely, with improved developer tools.

Absolutely, and the fact that there was only blocking Futures in the JDK at that time.
Having "monadic" Futures like CompletableFuture means you can avoid blocking completely (assuming that the CHM impl is non-blocking). This is highly useful when using it as a cache, and especially if there's a very skewed read distribution, paired with a multitude of readers.
 


-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMap computeIfAbsent

Doug Lea
In reply to this post by Doug Lea
On 12/21/2014 05:32 PM, Doug Lea wrote:
> We should write off coping with deadlock and infinite recursion. But
> we can still add some (cheap) integrity checks that can only be
> violated in the remaining user-error scenarios.

I added some very low overhead checks that improve diagnostic
coverage enough to catch most of these remaining potential user
errors. Given that we cannot in principle catch all of them
(and say so in the specs) this seems to be the best tradeoff.
Adding noticeable time or space overhead would provide only
almost-unnoticeably better diagnosis.

This is committed into jsr166 CVS, and will probably be
integrated as a fix for JDK-8062841.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest