ConcurrentHashMapV8

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

ConcurrentHashMapV8

Doug Lea

A candidate replacement for java.util.concurrent.ConcurrentHashMap
is now available as jsr166e.ConcurrentHashMapV8. This version
is much more amenable to upcoming support for aggregate
parallel operations (including, already, method "computeIfAbsent'
using a stand-in MappingFunction type).

The internal design is interestingly different.
Read the internal documentation for details.
(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)

Please give it a try, and let us know about experiences.

Links:
     *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
     * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar (compiled
using Java7 javac).
     * Browsable CVS sources:
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


On Sun, Aug 28, 2011 at 9:24 PM, Doug Lea <[hidden email]> wrote:

A candidate replacement for java.util.concurrent.ConcurrentHashMap
is now available as jsr166e.ConcurrentHashMapV8. This version
is much more amenable to upcoming support for aggregate
parallel operations (including, already, method "computeIfAbsent'

This is awesome news, thanks Doug!
 
using a stand-in MappingFunction type).

The internal design is interestingly different.
Read the internal documentation for details.
(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)

Please give it a try, and let us know about experiences.

Links:
   *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
   * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar (compiled using Java7 javac).
   * Browsable CVS sources: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Ben Manes
In reply to this post by Doug Lea
reply-all:


From: Ben Manes <[hidden email]>
To: [hidden email]
Sent: Sunday, August 28, 2011 2:00 PM
Subject: Re: [concurrency-interest] ConcurrentHashMapV8

If computation is being added, then bulk computation should be implemented as well. This is useful when a computation requires an I/O call, so a batch operation is an important optimization. The simplest approach is to insert placeholder nodes that are populated after the computation has completed. (e.g. see this old decorator that supplied it: http://code.google.com/p/concurrentlinkedhashmap/wiki/SelfPopulatingCache)

It wasn't obvious to me how recursive computations are handled for the same key and appears to livelock. This is an error condition that used to deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was true prior to waiting for the computation to finish.

The exceptional computation case may be worth documenting. CHM only fails the computing thread and allows the waiting threads to retry the computation. In other implementations the waiting threads rethrow the exception and don't recompute.

A recompute (a.k.a. refresh) method may also be useful, though can be adequately supplied by usages. This recomputes the value, but allows readers to continue to use the stale value until it completes. This can be useful for cache invalidation where it is preferred vs. exposing the latency to the user (if naively done by removing first).

-Ben

On Sun Aug 28th, 2011 12:24 PM PDT Doug Lea wrote:

>
>A candidate replacement for java.util.concurrent.ConcurrentHashMap
>is now available as jsr166e.ConcurrentHashMapV8. This version
>is much more amenable to upcoming support for aggregate
>parallel operations (including, already, method "computeIfAbsent'
>using a stand-in MappingFunction type).
>
>The internal design is interestingly different.
>Read the internal documentation for details.
>(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)
>
>Please give it a try, and let us know about experiences.
>
>Links:
>    *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
>    * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar (compiled using Java7 javac).
>    * Browsable CVS sources: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/
>
>
>_______________________________________________
>Concurrency-interest mailing list
>[hidden email]
>http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Joe Bowbeer
In reply to this post by Doug Lea
First impressions:

1. I like the MappingFunction name.  I hope the JDK8 name is as clear *and* specialized.

2. I think the snippet in computeIfAbsent is wrong; should be something like:

if (map.containsKey(key))
  return map.get(key);
value = mappingFunction.map(key);
if (value != null) {
  map.put(key, value);
}
return value;

Given the restrictions placed on computeIfAbsent's function (invoked atomically, short & simple, must not attempt to update any other mappings), I would like to see a more complete example.

3. The javadoc comment on computeVal reads "Implements computeIfAbsent" which indicates to me that "computeIfAbsent" is an interface.  Even though this is private-method doc, I suggest "Implementation of computeIfAbsent". That would be less confusing.

Joe

On Sun, Aug 28, 2011 at 12:24 PM, Doug Lea wrote:

A candidate replacement for java.util.concurrent.ConcurrentHashMap
is now available as jsr166e.ConcurrentHashMapV8. This version
is much more amenable to upcoming support for aggregate
parallel operations (including, already, method "computeIfAbsent'
using a stand-in MappingFunction type).

The internal design is interestingly different.
Read the internal documentation for details.
(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)

Please give it a try, and let us know about experiences.

Links:
   *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
   * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar (compiled using Java7 javac).
   * Browsable CVS sources: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

studdugie
In reply to this post by Doug Lea
Have you had an opportunity to benchmark this against Cliff Click's concurrent hash map?

Dane


On Sun, Aug 28, 2011 at 3:24 PM, Doug Lea <[hidden email]> wrote:

A candidate replacement for java.util.concurrent.ConcurrentHashMap
is now available as jsr166e.ConcurrentHashMapV8. This version
is much more amenable to upcoming support for aggregate
parallel operations (including, already, method "computeIfAbsent'
using a stand-in MappingFunction type).

The internal design is interestingly different.
Read the internal documentation for details.
(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)

Please give it a try, and let us know about experiences.

Links:
   *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
   * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar (compiled using Java7 javac).
   * Browsable CVS sources: http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Joe Bowbeer
Thanks for all the on- and off- list comments and suggestions!
I committed an update with various improvements.
Some follow-ups:

On 08/28/11 17:42, Joe Bowbeer wrote:
> Given the restrictions placed on computeIfAbsent's function (invoked
> atomically, short & simple, must not attempt to update any other mappings), I
> would like to see a more complete example.

Yes, the javadoc could use an example. Suggestions for a
tiny but informative one would be welcome.

On 08/28/11 17:00, Ben Manes wrote:
> If computation is being added, then bulk computation should be implemented as
> well. This is useful when a computation requires an I/O call, so a batch
> operation is an important optimization. The simplest approach is to insert
> placeholder nodes that are populated after the computation has completed.

Thanks, but the plain version of CHM doesn't itself deal with Futures (aka
placeholders). However, a cache version might do so, possibly based on
a CHM<K, Future<V>>.

> It wasn't obvious to me how recursive computations are handled for the same
> key and appears to livelock. This is an error condition that used to
> deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
> true prior to waiting for the computation to finish.

I'm not sure I follow. If the user function in turn calls computeIfAbsent
for the same key, then it seems irreparably broken in the same sense as,
for example. a user's equals() function that attempts to invoke map.put.

>
> The exceptional computation case may be worth documenting. CHM only fails

It is -- an exception leaves mapping untouched. The more arbitrary decision
is what to do if the computation returns null. ComputeIfAbsent could be defined
to throw NullPointerException, but this is a fairly heavy response to the
case where it turns out that the key has no mapping, since a null return
provides the same information.

> the computing thread and allows the waiting threads to retry the
> computation. In other implementations the waiting threads rethrow the
> exception and don't recompute.

Yes; I think this is the difference between a plain CHM and one that
uses Futures as values.

>
> A recompute (a.k.a. refresh) method may also be useful, though can be
> adequately supplied by usages. This recomputes the value, but allows readers
>  to continue to use the stale value until it completes. This can be useful
> for cache invalidation where it is preferred vs. exposing the latency to the
> user (if naively done by removing first).
>

Thanks! Yes, this (i.e. compute(k, f) vs computeIfAbsent(k, f))
is easy to add, and provides a symmetical API to the two forms
of put, and has some uses, so I added it.

On 08/28/11 22:52, Dane Foster wrote:
> Have you had an opportunity to benchmark this against Cliff Click's
> concurrent hash map?
>

Yes. Results vary across different kinds of tests, depending on operation
mixes, key types, temporal locality of lookups, etc, so I can't
give a better answer than: each is sometimes faster (and
sometimes by a lot) than the other. But the previous scalability
limits due to use of Segments is now gone, so the performance differences
now mostly reflect other design tradeoffs.

-Doug




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Bob Lee-8
On Mon, Aug 29, 2011 at 10:37 AM, Doug Lea <[hidden email]> wrote:
It wasn't obvious to me how recursive computations are handled for the same
key and appears to livelock. This is an error condition that used to
deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
true prior to waiting for the computation to finish.

I'm not sure I follow. If the user function in turn calls computeIfAbsent
for the same key, then it seems irreparably broken in the same sense as,
for example. a user's equals() function that attempts to invoke map.put.

It is irreparably broken, but it's an easy and common mistake to make. It's difficult to identify the problem when the program just deadlocks vs. throwing an explicit exception that identifies the at fault key. 

Bob


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Ben Manes
In reply to this post by Doug Lea
> Thanks, but the plain version of CHM doesn't itself deal with Futures

MapMaker's entry objects are basically light-weight futures and we plan on adding bulk computation eventually. Since computation is done by holding the node's lock and inserting it prior to computation, I don't see why a bulk computation would be much different.

> If the user function in turn calls computeIfAbsent for the same key, then it seems irreparably broken in the same sense as, for example. a user's equals() function that attempts to invoke map.put.

Its broken, but failing fast helps identify the issue and is easier to recover from than livelocks or deadlocks are. Any computation function that updates the map itself is wrong, but will occur in the wild and surprise the developer when it seems to randomly (and silently) break.


From: Doug Lea <[hidden email]>
To: [hidden email]
Sent: Monday, August 29, 2011 10:37 AM
Subject: Re: [concurrency-interest] ConcurrentHashMapV8

Thanks for all the on- and off- list comments and suggestions!
I committed an update with various improvements.
Some follow-ups:

On 08/28/11 17:42, Joe Bowbeer wrote:
> Given the restrictions placed on computeIfAbsent's function (invoked
> atomically, short & simple, must not attempt to update any other mappings), I
> would like to see a more complete example.

Yes, the javadoc could use an example. Suggestions for a
tiny but informative one would be welcome.

On 08/28/11 17:00, Ben Manes wrote:
> If computation is being added, then bulk computation should be implemented as
> well. This is useful when a computation requires an I/O call, so a batch
> operation is an important optimization. The simplest approach is to insert
> placeholder nodes that are populated after the computation has completed.

Thanks, but the plain version of CHM doesn't itself deal with Futures (aka
placeholders). However, a cache version might do so, possibly based on
a CHM<K, Future<V>>.

> It wasn't obvious to me how recursive computations are handled for the same
> key and appears to livelock. This is an error condition that used to
> deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
> true prior to waiting for the computation to finish.

I'm not sure I follow. If the user function in turn calls computeIfAbsent
for the same key, then it seems irreparably broken in the same sense as,
for example. a user's equals() function that attempts to invoke map.put.

>
> The exceptional computation case may be worth documenting. CHM only fails

It is -- an exception leaves mapping untouched. The more arbitrary decision
is what to do if the computation returns null. ComputeIfAbsent could be defined
to throw NullPointerException, but this is a fairly heavy response to the
case where it turns out that the key has no mapping, since a null return
provides the same information.

> the computing thread and allows the waiting threads to retry the
> computation. In other implementations the waiting threads rethrow the
> exception and don't recompute.

Yes; I think this is the difference between a plain CHM and one that
uses Futures as values.

>
> A recompute (a.k.a. refresh) method may also be useful, though can be
> adequately supplied by usages. This recomputes the value, but allows readers
>  to continue to use the stale value until it completes. This can be useful
> for cache invalidation where it is preferred vs. exposing the latency to the
> user (if naively done by removing first).
>

Thanks! Yes, this (i.e. compute(k, f) vs computeIfAbsent(k, f))
is easy to add, and provides a symmetical API to the two forms
of put, and has some uses, so I added it.

On 08/28/11 22:52, Dane Foster wrote:
> Have you had an opportunity to benchmark this against Cliff Click's
> concurrent hash map?
>

Yes. Results vary across different kinds of tests, depending on operation
mixes, key types, temporal locality of lookups, etc, so I can't
give a better answer than: each is sometimes faster (and
sometimes by a lot) than the other. But the previous scalability
limits due to use of Segments is now gone, so the performance differences
now mostly reflect other design tradeoffs.

-Doug




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Nathan Reynolds-2
>> If the user function in turn calls computeIfAbsent for the same key, then it seems irreparably broken in the same sense as, for example. a user's equals() function that attempts to invoke map.put.

> Its broken, but failing fast helps identify the issue and is easier to recover from than livelocks or deadlocks are. Any computation function that updates the map itself is wrong, but will occur in the wild and surprise the developer when it seems to randomly (and silently) break.

When working on very large complex application servers, the complex interactions between objects might cause a computation function to inadvertently update the map.  This is due to multiple developers make changes to different pieces of code and creating the loop.  Throwing an exception will be invaluable since the call stack will show the developers how the loop was created and it will allow for the application server to have a chance at recovering.

Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology

On 8/29/2011 11:37 AM, Ben Manes wrote:
> Thanks, but the plain version of CHM doesn't itself deal with Futures

MapMaker's entry objects are basically light-weight futures and we plan on adding bulk computation eventually. Since computation is done by holding the node's lock and inserting it prior to computation, I don't see why a bulk computation would be much different.

> If the user function in turn calls computeIfAbsent for the same key, then it seems irreparably broken in the same sense as, for example. a user's equals() function that attempts to invoke map.put.

Its broken, but failing fast helps identify the issue and is easier to recover from than livelocks or deadlocks are. Any computation function that updates the map itself is wrong, but will occur in the wild and surprise the developer when it seems to randomly (and silently) break.


From: Doug Lea [hidden email]
To: [hidden email]
Sent: Monday, August 29, 2011 10:37 AM
Subject: Re: [concurrency-interest] ConcurrentHashMapV8

Thanks for all the on- and off- list comments and suggestions!
I committed an update with various improvements.
Some follow-ups:

On 08/28/11 17:42, Joe Bowbeer wrote:
> Given the restrictions placed on computeIfAbsent's function (invoked
> atomically, short & simple, must not attempt to update any other mappings), I
> would like to see a more complete example.

Yes, the javadoc could use an example. Suggestions for a
tiny but informative one would be welcome.

On 08/28/11 17:00, Ben Manes wrote:
> If computation is being added, then bulk computation should be implemented as
> well. This is useful when a computation requires an I/O call, so a batch
> operation is an important optimization. The simplest approach is to insert
> placeholder nodes that are populated after the computation has completed.

Thanks, but the plain version of CHM doesn't itself deal with Futures (aka
placeholders). However, a cache version might do so, possibly based on
a CHM<K, Future<V>>.

> It wasn't obvious to me how recursive computations are handled for the same
> key and appears to livelock. This is an error condition that used to
> deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
> true prior to waiting for the computation to finish.

I'm not sure I follow. If the user function in turn calls computeIfAbsent
for the same key, then it seems irreparably broken in the same sense as,
for example. a user's equals() function that attempts to invoke map.put.

>
> The exceptional computation case may be worth documenting. CHM only fails

It is -- an exception leaves mapping untouched. The more arbitrary decision
is what to do if the computation returns null. ComputeIfAbsent could be defined
to throw NullPointerException, but this is a fairly heavy response to the
case where it turns out that the key has no mapping, since a null return
provides the same information.

> the computing thread and allows the waiting threads to retry the
> computation. In other implementations the waiting threads rethrow the
> exception and don't recompute.

Yes; I think this is the difference between a plain CHM and one that
uses Futures as values.

>
> A recompute (a.k.a. refresh) method may also be useful, though can be
> adequately supplied by usages. This recomputes the value, but allows readers
>  to continue to use the stale value until it completes. This can be useful
> for cache invalidation where it is preferred vs. exposing the latency to the
> user (if naively done by removing first).
>

Thanks! Yes, this (i.e. compute(k, f) vs computeIfAbsent(k, f))
is easy to add, and provides a symmetical API to the two forms
of put, and has some uses, so I added it.

On 08/28/11 22:52, Dane Foster wrote:
> Have you had an opportunity to benchmark this against Cliff Click's
> concurrent hash map?
>

Yes. Results vary across different kinds of tests, depending on operation
mixes, key types, temporal locality of lookups, etc, so I can't
give a better answer than: each is sometimes faster (and
sometimes by a lot) than the other. But the previous scalability
limits due to use of Segments is now gone, so the performance differences
now mostly reflect other design tradeoffs.

-Doug




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Charles Fry-6
In reply to this post by Doug Lea
Hi Doug,

This is an exciting development. MapMaker has definitely proved the value of coupling maps with computation, and it's great to see low level support for that.

There are a few typos in the new compute javadocs:

- "with he given key"
- the pseudocode doesn't return in the first if clause

Also, it would be nice to document what "operations on this map by other threads may be blocked while computation is in progress." For example, I assume that get will never block on computation, but that any writes will.

Charles

On Mon, Aug 29, 2011 at 13:37, Doug Lea <[hidden email]> wrote:
Thanks for all the on- and off- list comments and suggestions!
I committed an update with various improvements.
Some follow-ups:


On 08/28/11 17:42, Joe Bowbeer wrote:
Given the restrictions placed on computeIfAbsent's function (invoked
atomically, short & simple, must not attempt to update any other mappings), I
would like to see a more complete example.

Yes, the javadoc could use an example. Suggestions for a
tiny but informative one would be welcome.


On 08/28/11 17:00, Ben Manes wrote:
If computation is being added, then bulk computation should be implemented as
well. This is useful when a computation requires an I/O call, so a batch
operation is an important optimization. The simplest approach is to insert
placeholder nodes that are populated after the computation has completed.

Thanks, but the plain version of CHM doesn't itself deal with Futures (aka
placeholders). However, a cache version might do so, possibly based on
a CHM<K, Future<V>>.


It wasn't obvious to me how recursive computations are handled for the same
key and appears to livelock. This is an error condition that used to
deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
true prior to waiting for the computation to finish.

I'm not sure I follow. If the user function in turn calls computeIfAbsent
for the same key, then it seems irreparably broken in the same sense as,
for example. a user's equals() function that attempts to invoke map.put.



The exceptional computation case may be worth documenting. CHM only fails

It is -- an exception leaves mapping untouched. The more arbitrary decision
is what to do if the computation returns null. ComputeIfAbsent could be defined
to throw NullPointerException, but this is a fairly heavy response to the
case where it turns out that the key has no mapping, since a null return
provides the same information.


the computing thread and allows the waiting threads to retry the
computation. In other implementations the waiting threads rethrow the
exception and don't recompute.

Yes; I think this is the difference between a plain CHM and one that
uses Futures as values.



A recompute (a.k.a. refresh) method may also be useful, though can be
adequately supplied by usages. This recomputes the value, but allows readers
 to continue to use the stale value until it completes. This can be useful
for cache invalidation where it is preferred vs. exposing the latency to the
user (if naively done by removing first).


Thanks! Yes, this (i.e. compute(k, f) vs computeIfAbsent(k, f))
is easy to add, and provides a symmetical API to the two forms
of put, and has some uses, so I added it.


On 08/28/11 22:52, Dane Foster wrote:
Have you had an opportunity to benchmark this against Cliff Click's
concurrent hash map?


Yes. Results vary across different kinds of tests, depending on operation
mixes, key types, temporal locality of lookups, etc, so I can't
give a better answer than: each is sometimes faster (and
sometimes by a lot) than the other. But the previous scalability
limits due to use of Segments is now gone, so the performance differences
now mostly reflect other design tradeoffs.

-Doug





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Zhong Yu
In reply to this post by Doug Lea
This is great, a very much needed feature.

About recursive computeIfAbsent() calls (on the same key) - the
behavior can be modeled the same as nested transactions: the thread
that locks the key can update the entry, and a following read in the
same thread will see that value; however the value is only committed,
i.e. visible to other threads, upon the outermost unlock action. All
threads can read the latest committed value of an entry without
locking.

I have just implemented something similar, which I'll attach later.
The code is very straightforward, given the facility of
lock-update-unlock an entry. Actually, maybe ConcurrentHashMap should
expose this facility; value based locking ( lock(v)..unlock(v) ) has
been desired by some people, but there are no well known
implementations.

If ConcurrentHashMap exposes per-entry lock-update-unlock facility,
computeIfAbsent() impl would be so trivial, it doesn't need to be
provided by JDK, everybody can write it.


Zhong Yu

---------------------------------

// code below is released to public domain by the author

public interface Provider<T>
{
    T get();
}

public class ConcurrentScope extends AtomicHashMap<Object,Object>
{
    public <T> T computeIfAbsent(Object key, Provider<T> provider)
    {
        Object val = super.get(key);
        if(val==null)
        {
            if(provider==null)
                return null;

            try( Entry<Object> entry = super.lock(key) )
            {
                val = entry.getValue();
                if(val==null)
                {
                    val = provider.get();  // alien code. may block,
throw, recurse
                    entry.setValue(val);
                }
            } //auto unlock
        }
        return (T)val;
    }
}

// AtomicHashMap.java ---------------------------

/**
 * A concurrent hash map where an entry can be individually locked to
implement atomic operations.
 * Example usages:
 * <pre>
 * import ohm.infra.util.AtomicHashMap;
 * import ohm.infra.util.AtomicHashMap.Entry;
 *
 *     AtomicHashMap&lt;K,V> map = new AtomicHashMap&lt;>();
 *     Entry&lt;V> entry = map.lock(key)
 *     try
 *     {
 *         v1 = entry.getValue();
 *         ...
 *         entry.setValue(v2);
 *     }
 *     finally
 *     {   entry.unlock();    }
 * </pre>
 * <tt>AtomicHashMap.Entry</tt> is <tt>AutoCloseable</tt>, the
try-with-resource syntax would be simpler to use
 * <pre>
 *
 *     try( Entry&lt;V> entry = map.lock(key) ) // auto unlock
 *     {
 *        v1 = entry.getValue();
 *        ...
 *        entry.setValue(v2);
 *     }
 * </pre>
 * <p>
 *     The only way to get an entry for a <tt>key</tt> is through
{@link #lock lock(key)}.
 *     The entry must be used in the same thread that locked it; once
it's unlocked,
 *     the object representing the entry shall not be reused. To get
the entry again,
 *     {@link #lock lock(key)} must be invoked again. Nested
lock-unlock on the same key are allowed.
 *     Upon the outermost unlock action, the value set by the last
<tt>setValue()</tt> is "committed".
 * </p>
 * <p>
 *     {@link #get get(key) } reads the latest committed value without
locking. It's like
 *     a volatile read; its performance is on par with {@link
java.util.concurrent.ConcurrentHashMap#get}.
 * </p>
 * <p>
 *     Implementation note: an AtomicHashMap is internally backed up
by a {@link java.util.concurrent.ConcurrentHashMap}.
 *     Every entry is backed up by a {@link
java.util.concurrent.locks.ReentrantLock}.
 *     To save space, an entry is evicted from the internal map when
the entry is unlocked
 *     and its committed value is null. If the entry is needed again,
{@link #lock lock(key)} will create a new one.
 *     The "active entries" of the map, as referred to by {@link
#size()} and {@link #keySet()},
 *     are entries that are either being locked or having non-null
committed values.
 * </p>
 *
 * @param <K> type of keys
 * @param <V> type of values
 */
public class AtomicHashMap<K,V>
{
    /**
     * An entry in a AtomicHashMap.
     *
     * @param <V> type of value
     * @see ohm.infra.util.to_lea.AtomicHashMap
     */
    public interface Entry<V> extends AutoCloseable
    {
        V getValue();

        void setValue(V value);

        /**
         * @return number of times the current thread has locked the
entry (minus unlocks).
         */
        int getLockCount();

        /**
         * Unlock the entry; if lock count is 0, commit the value.
         */
        void unlock();

        /**
         * Same as {@link #unlock() }
         */
        void close();
    }

    static class EntryImpl<V> extends ReentrantLock implements Entry<V>
    {
        final Object key;
        final ConcurrentHashMap map;
        EntryImpl(boolean fair, Object key, ConcurrentHashMap map)
        {
            super(fair);
            this.key = key;
            this.map = map;
        }

        public int getLockCount(){ return super.getHoldCount(); }


        V workingValue;
        volatile V committedValue;
        boolean retired;

        // get/set must be called by the owner thread. we have given
plenty of warnings
        // so we skip the check of `super.isHeldByCurrentThread()` here.
        // if client violates that, `workingValue` is messed up; no
harm to our structure.

        public V getValue()
        {
            return workingValue;
        }
        public void setValue(V value)
        {
            workingValue = value;
        }

        // if false, entry is retired
        boolean acquire()
        {
            super.lock();
            if(!retired)
                return true;
            // retired
            super.unlock(); // another sucker may be blocked on acquire() me.
            return false;
        }

        public void unlock()
        {
            if(super.getHoldCount()==1) // outermost unlock
            {
                committedValue = workingValue;

                if(workingValue==null)
                {
                    retired=true;
                    map.remove(key);
                }
            }

            super.unlock();
        }

        public void close()
        {
            unlock();
        }
    }

    final ConcurrentHashMap<K,EntryImpl<V>> map;
    final boolean fair;

    /**
     * With default {@link java.util.concurrent.ConcurrentHashMap}
constructor parameters, and unfair locking.
     */
    public AtomicHashMap()
    {
        this.map = new ConcurrentHashMap<>();
        this.fair = false;
    }

    /**
     * See {@link
java.util.concurrent.ConcurrentHashMap#ConcurrentHashMap(int, float,
int)
     *            ConcurrentHashMap(initialCapacity, loadFactor,
concurrencyLevel)}
     * for meaning of first 3 parameters.
     *
     * @param fair whether locking should be fair
     */
    public AtomicHashMap(int initialCapacity, float loadFactor, int
concurrencyLevel, boolean fair)
    {
        this.map = new ConcurrentHashMap<>(initialCapacity,
loadFactor, concurrencyLevel);
        this.fair = fair;
    }

    /**
     * Exclusively lock the entry of the key.
     */
    public Entry<V> lock(K key)
    {
        while(true)
        {
            EntryImpl<V> entry = map.get(key); // in most cases entry
already exists.
            if(entry==null)
            {
                entry = new EntryImpl<>(fair, key, map);
                EntryImpl<V> prev = map.putIfAbsent(key, entry);
                if(prev!=null)
                    entry = prev;
            }

            if(entry.acquire())
                return entry;
            // else entry retired, re-try.
        }
    }

    /**
     * Get the latest committed value for the key. This call will not block.
     */
    public V get(Object key)
    {
        EntryImpl<V> entry = map.get(key);
        if(entry==null)
            return null;
        if(entry.isHeldByCurrentThread())
            return entry.workingValue;
        // entry could be retired, doesn't matter.
        return entry.committedValue;
    }

    /**
     * Atomically set the <tt>newValue</tt> to the entry of the key,
     * simply, lock the entry, set the value, unlock the entry.
     * If <tt>newValue</tt> is null, the entry may be evicted.
     * This call may block if the entry is being locked by other threads.
     *
     * @return the old value
     */
    public V set(K key, V newValue)
    {
        try( Entry<V> entry = lock(key) )
        {
            V oldValue = entry.getValue();
            entry.setValue(newValue);
            return oldValue;
        }
    }

    /**
     * @return the number of active entries; performs like {@link
java.util.concurrent.ConcurrentHashMap#size()}
     */
    public int size() { return map.size(); }

    /**
     * @return set of keys of active entries, read only.
     */
    public Set<K> keySet()
    {
        Set<K> keySet = map.keySet();
        // that set is mutable, and removing a key there will remove
an entry in map, bypassing our protocol.
        return Collections.unmodifiableSet(keySet); // we must forbid
that; read only to client.
    }

    /**
     * Set all entries' values to null. Every entry may be evicted.
     * This call may block if an entry is being locked by another thread.
     */
    public void clear()
    {
        // we can't simply call map.clear(). must lock entry before removal.
        for(Map.Entry<K,EntryImpl<V>> e : map.entrySet())
        {
            EntryImpl<V> entry = e.getValue(); // can be null due to
concurrent modification.
            if(entry!=null && entry.acquire())
            {
                entry.setValue(null);
                entry.unlock();
            }
        }
    }

}
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Bob Lee-8
On 08/29/11 13:52, Bob Lee wrote:

> On Mon, Aug 29, 2011 at 10:37 AM, Doug Lea <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>         It wasn't obvious to me how recursive computations are handled for the same
>         key and appears to livelock. This is an error condition that used to
>         deadlocked MapMaker. I fixed this by failing fast if Thread.holdsLock() was
>         true prior to waiting for the computation to finish.
>
>
>     I'm not sure I follow. If the user function in turn calls computeIfAbsent
>     for the same key, then it seems irreparably broken in the same sense as,
>     for example. a user's equals() function that attempts to invoke map.put.
>
>
> It is irreparably broken, but it's an easy and common mistake to make. It's
> difficult to identify the problem when the program just deadlocks vs. throwing
> an explicit exception that identifies the at fault key.
>

Well, in CHMV8. it won't deadlock, but instead gets into
an infinite loop (a form livelock), which makes it a more
marginal call whether to place a check that is not cheap
along a common path. But I'm tentatively planning to put it
in to next update anyway.

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Charles Fry-6
On 08/29/11 18:02, Charles Fry wrote:
> There are a few typos in the new compute javadocs:

Thanks! Fixed in next update.

> Also, it would be nice to document what "operations on this map by other threads
> may be blocked while computation is in progress." For example, I assume that get
> will never block on computation, but that any writes will.

Read-only operations (get, iterators, etc) never block at all.
Exactly which update operations block is a random phenomenon,
so left fuzzy. The number of entries covered by any given
node serving as a lock is, assuming lack of too many identical
hashCode values by different keys, approximately  Poisson
distributed with, on average at steady state, a parameter
of about 0.5 under 0.75 loadFactor. On average, about 75%
of the locks cover exactly one entry, the rest cover two
or more. And roughly, the probability that
holding a given lock will block an unrelated update
is around 1/8n (n the number of elements). On the other hand,
holding the lock will always prevent an ongoing resize operation
from completing, which may cause bins to overpopulate and
overall throughput to degrade. So people should avoid
long computations that are performed within these locks.

Further digressing: Despite disadvantages, using locks
for statistically very short lists turns out to be much more
stable and efficient than alternative non-blocking strategies
I tried that also meet the design goal of supporting efficient
partitioning and traversability, which is needed for upcoming
parallel operations on maps.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Nathan Reynolds-2
> Further digressing: Despite disadvantages, using locks for statistically very short lists turns out to be much more stable and efficient than alternative non-blocking strategies I tried

I realize this is a digression, but it is interesting to me.  Are you saying that compareAndSet() took more time than locks?

Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology

On 8/29/2011 5:20 PM, Doug Lea wrote:
On 08/29/11 18:02, Charles Fry wrote:
There are a few typos in the new compute javadocs:

Thanks! Fixed in next update.

Also, it would be nice to document what "operations on this map by other threads
may be blocked while computation is in progress." For example, I assume that get
will never block on computation, but that any writes will.

Read-only operations (get, iterators, etc) never block at all.
Exactly which update operations block is a random phenomenon,
so left fuzzy. The number of entries covered by any given
node serving as a lock is, assuming lack of too many identical
hashCode values by different keys, approximately  Poisson
distributed with, on average at steady state, a parameter
of about 0.5 under 0.75 loadFactor. On average, about 75%
of the locks cover exactly one entry, the rest cover two
or more. And roughly, the probability that
holding a given lock will block an unrelated update
is around 1/8n (n the number of elements). On the other hand,
holding the lock will always prevent an ongoing resize operation
from completing, which may cause bins to overpopulate and
overall throughput to degrade. So people should avoid
long computations that are performed within these locks.

Further digressing: Despite disadvantages, using locks
for statistically very short lists turns out to be much more
stable and efficient than alternative non-blocking strategies
I tried that also meet the design goal of supporting efficient
partitioning and traversability, which is needed for upcoming
parallel operations on maps.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Zhong Yu
In reply to this post by Doug Lea
On Mon, Aug 29, 2011 at 6:43 PM, Doug Lea <[hidden email]> wrote:

> Well, in CHMV8. it won't deadlock, but instead gets into
> an infinite loop (a form livelock), which makes it a more
> marginal call whether to place a check that is not cheap
> along a common path. But I'm tentatively planning to put it
> in to next update anyway.

If the caller has to worry about these things, it makes
computerIfAbsent(key, func) much less useful.

Usually this method is needed when the computation is expensive or has
side effects. (In many code structures the caller can't be sure about
the details of `func.map()`, therefore the worst is assumed.)

Requiring that the computation is transparent to the caller, must be
inexpensive and must not write to the map, will turn away majority of
potential users. This makes the method much less valuable than it
first appears.

In case the requirement can be met, `func.map()` can be cheaply and
safely invoked anyway, the saving by `computeIfAbsent()` is very
little, except in some extreme use cases (and these people can
probably afford their own customized implementations).

The requirement that the computation must not write to entries of
different keys seems especially hard to cope with. For example, if a
map is used as a cache; when computing value for `k1`, it might be
reasonable to disallow recursion on `k1`; however, it will be very odd
and inconvenient to disallow `cache.get( k2, #{ load(k2) } )` when
computing value for `k1`.

Zhong Yu
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Nathan Reynolds-2
On 08/29/11 20:42, Nathan Reynolds wrote:
>  > Further digressing: Despite disadvantages, using locks for statistically very
> short lists turns out to be much more stable and efficient than alternative
> non-blocking strategies I tried
>
> I realize this is a digression, but it is interesting to me. Are you saying that
> compareAndSet() took more time than locks?

No. The issue is that CHM needs to manage all five of:
insert/remove bin head, insert/remove internal node, and
split the list into upper and lower parts during
resize and replace with forwarding node; all the while maintaining
concurrent readability. There are non-blocking techniques for each
of these in isolation (mainly: deletion via dummy nodes that
I use in ConcurrentSkipLists, and bit-reversed node ordering
for splitting) but together, they introduce a lot of overhead
and contention storms. (For similar reasons, Herlihy & Shavit
recommend lazy-list-locking. which is an upside-down version
of the technique here.) for most list operations rather than
non-blocking techniques. Even though these require more atomic
operation overhead, in the vastly most common case they are
uncontended and thus cheap on recent processors, and more than make
up for the reduction  in other per-node bookkeeping that you'd
need to track dummy nodes etc. On contention, they block threads
out rather than allowing a lot of pointwise CAS misses and retries.

Plus, CHMV8 does still use a single CAS for the most common
update operation, adding a new node into an empty bin.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Zhong Yu
On 08/30/11 00:42, Zhong Yu wrote:
> The requirement that the computation must not write to entries of
> different keys seems especially hard to cope with. For example, if a
> map is used as a cache; when computing value for `k1`, it might be
> reasonable to disallow recursion on `k1`; however, it will be very odd
> and inconvenient to disallow `cache.get( k2, #{ load(k2) } )` when
> computing value for `k1`.

Yes. As I mentioned in other replies, a version suitable for
arbitrary caches should be based on an internal form of Futures
that can deal with reservations/place-holders. But this
(as well as accommodating weak references etc) would
add a lot of overhead and code bulk to operations for all
other CHM usages. So stay tuned for a version supporting such things.

> Requiring that the computation is transparent to the caller, must be
> inexpensive and must not write to the map, will turn away majority of
> potential users.

That's all for the best. The main problem computeIfAbsent
addresses is people often getting wrong the alternative
(and still usually best) idiom of:
if (map.get(k) == null)
    map.putIfAbsent(k, new V(k));
Even though it introduces potential contention (no free lunch),
with lambda syntax support, computeIfAbsent will be less
error-prone. As in, something like:
   map.computeIfAbsent(k, #{k -> new V(k)} };

In other words, the use of locks in CHMV8 is just an internal
implementation detail that may change at any time (although
as indicated in other postings, is not too likely to change).
Even without them, any support for computeIfAbsent introduces
contention etc, so users should keep the functions short.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Jason T. Greene
In reply to this post by Doug Lea
On 8/28/11 2:24 PM, Doug Lea wrote:

>
> A candidate replacement for java.util.concurrent.ConcurrentHashMap
> is now available as jsr166e.ConcurrentHashMapV8. This version
> is much more amenable to upcoming support for aggregate
> parallel operations (including, already, method "computeIfAbsent'
> using a stand-in MappingFunction type).
>
> The internal design is interestingly different.
> Read the internal documentation for details.
> (http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)

Wow! This implementation has a very nice set of attributes, in
particular the memory efficiency. I look forward to giving it a spin.

--
Jason T. Greene
JBoss, a division of Red Hat
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Zhong Yu
In reply to this post by Doug Lea
Thanks for clarifying the intended usage for me. Still, given the
heavy restriction on the function, I can't grasp the usefulness of the
method without a realistic use case.

In the following idiom

    if(map.get(k)==null)
        v  = calc(k);
        map.putIfAbsent( k, v );

the 1st line is an optimization, since both calc() and putIfAbsent()
are relatively more expensive than get(), if the key is likely
contained in the map, it's cheaper to try get() first.

The two concerns still exist for computeIfAbsent().

In the lambda expression example,

    computeIfAbsent( k, #{ k -> calc(k) } );

a new object (the function) is created every time, which is quite
expensive relative to get(). So it pays to

    if(map.get(k)==null)
        computeIfAbsent( k, #{ k -> calc(k) } );

If the function depends solely on `k`, we can eliminate the object creation by

    static final Function func = new ...;

    computeIfAbsent( k, func );

with loss of readability provided by lambda expression.

The 2nd concern is that computeIfAbsent() is more expensive* than
get(). So it pays to

    if(map.get(k)==null)
        computeIfAbsent( k, func );

(* based on the latest implementation of computeIfAbsent() - maybe it
can be improved so the idiom above becomes unnecessary )

Finally, comparing the performance of

    v = func.map(k);
    putIfAbsent( k, v );
vs
    computeIfAbsent( k, func );

The two versions perform almost the same, both in uncontended and
contended case, regardless of the complexity of func.map().

Zhong Yu

(I'm sorry if my comments are all negative in nature.)

On Tue, Aug 30, 2011 at 6:13 AM, Doug Lea <[hidden email]> wrote:

> On 08/30/11 00:42, Zhong Yu wrote:
>>
>> The requirement that the computation must not write to entries of
>> different keys seems especially hard to cope with. For example, if a
>> map is used as a cache; when computing value for `k1`, it might be
>> reasonable to disallow recursion on `k1`; however, it will be very odd
>> and inconvenient to disallow `cache.get( k2, #{ load(k2) } )` when
>> computing value for `k1`.
>
> Yes. As I mentioned in other replies, a version suitable for
> arbitrary caches should be based on an internal form of Futures
> that can deal with reservations/place-holders. But this
> (as well as accommodating weak references etc) would
> add a lot of overhead and code bulk to operations for all
> other CHM usages. So stay tuned for a version supporting such things.
>
>> Requiring that the computation is transparent to the caller, must be
>> inexpensive and must not write to the map, will turn away majority of
>> potential users.
>
> That's all for the best. The main problem computeIfAbsent
> addresses is people often getting wrong the alternative
> (and still usually best) idiom of:
> if (map.get(k) == null)
>   map.putIfAbsent(k, new V(k));
> Even though it introduces potential contention (no free lunch),
> with lambda syntax support, computeIfAbsent will be less
> error-prone. As in, something like:
>  map.computeIfAbsent(k, #{k -> new V(k)} };
>
> In other words, the use of locks in CHMV8 is just an internal
> implementation detail that may change at any time (although
> as indicated in other postings, is not too likely to change).
> Even without them, any support for computeIfAbsent introduces
> contention etc, so users should keep the functions short.
>
> -Doug
>
>

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
On 08/30/11 16:52, Zhong Yu wrote:
> Thanks for clarifying the intended usage for me. Still, given the
> heavy restriction on the function, I can't grasp the usefulness of the
> method without a realistic use case.

We didn't support computeIfAbsent before, for the
kinds of reasons you list. But the compelling reason
for including it anyway is error avoidance. An upcoming paper
in OOPSLA 2011 examined usages of ConcurrentHashMap in real-world
code, and found that by far, the most common error was
incorrect use of putIfAbsent in cases covered by computeIfAbsent.
Adding computeIfAbsent now may be too little, too late, but even this
is only made easily usable with lambda syntax. We do expect
to see some usages that don't follow advice and generate
occasional performance anomalies, which is a lot better than
people occasionally getting it completely wrong.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
1234