ConcurrentHashMapV8

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Bob Lee-8
On Tue, Aug 30, 2011 at 4:22 PM, Doug Lea <[hidden email]> wrote:
An upcoming paper
in OOPSLA 2011 examined usages of ConcurrentHashMap in real-world
code, and found that by far, the most common error was
incorrect use of putIfAbsent in cases covered by computeIfAbsent

Hey... remember when I did that for the JDK and Google's codebase back in '09? :-)

Bob 

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Nathan Reynolds-2
In reply to this post by Doug Lea
Doug,

I am wondering if you have data which shows how many cycles get(), put(), remove() and putIfAbsent() take on ConcurrentHashMapV8, ConcurrentHashMapV7, synchronized HashMap and unsynchronized HashMap?

Nathan Reynolds | Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering | Server Technology


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Bob Lee-8
On 08/30/11 19:36, Bob Lee wrote:

> On Tue, Aug 30, 2011 at 4:22 PM, Doug Lea <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     An upcoming paper
>     in OOPSLA 2011 examined usages of ConcurrentHashMap in real-world
>     code, and found that by far, the most common error was
>     incorrect use of putIfAbsent in cases covered by computeIfAbsent
>
>
> Hey... remember when I did that for the JDK and Google's codebase back in '09? :-)

Sorry, I should have added that you and others have been asking
for this for years too!

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Nathan Reynolds-2
On 08/30/11 19:45, Nathan Reynolds wrote:
> Doug,
>
> I am wondering if you have data which shows how many cycles get(), put(),
> remove() and putIfAbsent() take on ConcurrentHashMapV8, ConcurrentHashMapV7,
> synchronized HashMap and unsynchronized HashMap?
>

No cycle counts but...

There was a bunch of mail on measuring expected hash map performance
for typical loads on the core-libs openjdk list around July 2010(?).

For single-threaded performance, people seem to believe the results of
running our MapMicroBenchmark.
(See http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/loops/)
For which on most machines, CHMV8 is about 3% faster than CHMV7,
and 10-25% slower (depending on #elements) than java.util.Hashmap.
I don't have results for synchronizedMap(HashMap) handy but
java.util.Hashtable (which is about the same) is generally
slower than either version of CHM, even with biased locking
turned on.

For multi-threaded tests, there's less consensus about the
best micro-benchmark. We have a few (including MapLoops),
and there are others out there -- including one by Cliff Click.
In general CHMV8 is the best of all the concurrent hash map
implementations I've tested on tests where there is some
temporal locality of access (which is typical) but open-address
versions such as Cliff Clicks's are faster for get/put/remove-only
tests with randomized key use, mainly because of fewer cache misses.

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

mike.duigou
In reply to this post by Doug Lea
On Aug 28 2011, at 19:53 , [hidden email] wrote:
>
>  4. Re: ConcurrentHashMapV8 (Joe Bowbeer)

> From: Joe Bowbeer <[hidden email]>
> Subject: Re: [concurrency-interest] ConcurrentHashMapV8
> First impressions:
>
> 1. I like the MappingFunction name.  I hope the JDK8 name is as clear *and*
> specialized.


The current proposed name for Java SE 8 is 'Mapper' and it has the same signature as Doug's placeholder.

http://hg.openjdk.java.net/lambda/lambda/jdk/file/38969e64b60e/src/share/classes/java/util/functions/Mapper.java

One key difference is that the proposed Java SE 8 Mapper does not say anything about null handling and it is anticipated that Mapper will be used in places where null must be accommodated. The CHM may choose to just use the Mapper and document it's handling of null or could alternately create a specialized sub-interface to document the extended semantics.

If there are any concerns with the current proposed Mapper or the other basic functional interfaces proposed for Java SE 8 we'd be very interested to hear them on the [hidden email] list.

Thanks!

Mike
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
On 01-09-2011 21:07, Mike Duigou wrote:

> On Aug 28 2011, at 19:53 , [hidden email] wrote:
>>
>>   4. Re: ConcurrentHashMapV8 (Joe Bowbeer)
>
>> From: Joe Bowbeer<[hidden email]>
>> Subject: Re: [concurrency-interest] ConcurrentHashMapV8
>> First impressions:
>>
>> 1. I like the MappingFunction name.  I hope the JDK8 name is as clear *and*
>> specialized.
>
>
> The current proposed name for Java SE 8 is 'Mapper' and it has the same signature as Doug's placeholder.
>
> http://hg.openjdk.java.net/lambda/lambda/jdk/file/38969e64b60e/src/share/classes/java/util/functions/Mapper.java
>
> One key difference is that the proposed Java SE 8 Mapper does not say anything about null handling and it is anticipated that Mapper will be used in places where null must be accommodated. The CHM may choose to just use the Mapper and document it's handling of null or could alternately create a specialized sub-interface to document the extended semantics.
>
> If there are any concerns with the current proposed Mapper or the other basic functional interfaces proposed for Java SE 8 we'd be very interested to hear them on the [hidden email] list.
>
> Thanks!
>
> Mike

Thanks for bringing this up. I must admit I really don't like the name
Block. I might be damage by working to much with concurrency and IO but
I really thought it had something to do with blocking/non-blocking.

I really like the name Procedure as used in Doug's ParallelArray classes
much better.
(http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/extra166y/Ops.java?revision=1.3&view=markup)

cheers
   Kasper


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
In reply to this post by mike.duigou
On 01-09-2011 21:07, Mike Duigou wrote:
> On Aug 28 2011, at 19:53 , [hidden email] wrote:
> One key difference is that the proposed Java SE 8 Mapper does not say anything about null handling and it is anticipated that Mapper will be used in places where null must be accommodated. The CHM may choose to just use the Mapper and document it's handling of null or could alternately create a specialized sub-interface to document the extended semantics.
No need to create specialized sub-interface for this.
Just document on the Mapper interface that some consumers of this
interface may place restrictions on the type of output they accept from
a mapper. Like ju.Map does it. HashMap accepts null keys and values.
ConcurrentHashMap doesn't.

Cheers
   Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Dr Heinz M. Kabutz
In reply to this post by Doug Lea
Hi Doug,

we had a look at the CHM yesterday and were wondering why you did not
explicitly check for MOVED instead of the more obscure

            else if (e.hash < 0)
                tab = (Node[])e.key;

If would be clearer if that had been:

            else if (e.hash == MOVED)
                tab = (Node[])e.key;

Magic constants are usually a bad idea, even with < 0.

Regards

Heinz
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java(tm) Specialists' Newsletter"
Sun Java Champion
IEEE Certified Software Development Professional
http://www.javaspecialists.eu
Tel: +30 69 72 850 460
Skype: kabutz



On 8/28/11 10:24 PM, Doug Lea wrote:

>
> A candidate replacement for java.util.concurrent.ConcurrentHashMap
> is now available as jsr166e.ConcurrentHashMapV8. This version
> is much more amenable to upcoming support for aggregate
> parallel operations (including, already, method "computeIfAbsent'
> using a stand-in MappingFunction type).
>
> The internal design is interestingly different.
> Read the internal documentation for details.
> (http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/ConcurrentHashMapV8.java?view=log)
>
>
> Please give it a try, and let us know about experiences.
>
> Links:
>     *  API specs:  http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/
>     * jar file: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166e.jar 
> (compiled using Java7 javac).
>     * Browsable CVS sources:
> http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/
>
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
On 09/02/11 09:51, Dr Heinz M. Kabutz wrote:
> else if (e.hash < 0)
> tab = (Node[])e.key;
>
> If would be clearer if that had been:
>
> else if (e.hash == MOVED)
> tab = (Node[])e.key;
>
> Magic constants are usually a bad idea, even with < 0.

Thanks. Changed to ...
     else if (eh < 0) { // sign bit set -- bin was moved during resize
... (plus some other internal documentation improvements),
because in the next update, forwarding nodes will have other
information encoded in hash fields to support range traversals
needed for parallel aggregate operations. The only distinguishing
property is the sign bit, which is most sensibly expressed as
a check of  < 0.

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Brian Goetz-3
In reply to this post by Zhong Yu
> In the lambda expression example,
>
>      computeIfAbsent( k, #{ k ->  calc(k) } );
>
> a new object (the function) is created every time, which is quite
> expensive relative to get(). So it pays to

No, this is incorrect for a significant fraction of the cases.  For
lambdas that don't capture values from the enclosing scope (which is not
all of them, but a lot of them), no object will be created (after the
first one.)

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Doug Lea

Among some further ongoing improvements and extensions (mainly
to support parallel aggregate operations), I'm contemplating
(nearly) ignoring "loadFactor" constructor arguments for
ConcurrentHashMap (in "V8" form for now, but ultimately
targeting revised j.u.c version). The main motivation
is that in concurrent hash maps  (of any form) collisions
and contention are interdependent, so anyone using a
higher-than-default load factor because they think it will
save space is also unexpectedly getting poorer multithreaded
performance. Given how hard it was to give clear answers to
questions on this list about exactly what blocks when based on
probability distributions with a parameter based on load factor,
I'm now thinking that is best to internally always use
the default, and use the optional constructor parameter
only as an initial sizing hint (larger loadFactor ->
smaller initial capacity).  This will enable simpler
characterization of expected performance.

Scanning for uses of ConcurrentHashMap constructors
on google code search, it seems that hardly anyone ever
overrides the default anyway, so this probably doesn't
matter much either way. But if you have ever done so,
and have a good reason, could you let me know? Thanks!

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Alex Miller-10
In reply to this post by Doug Lea
> Doug Lea said:

> Scanning for uses of ConcurrentHashMap constructors
> on google code search, it seems that hardly anyone ever
> overrides the default anyway, so this probably doesn't
> matter much either way. But if you have ever done so,
> and have a good reason, could you let me know? Thanks!

I think the only time I have ever seen someone use an alternate load
factor is in the case where they are creating a Map that has a known
maximum fixed size and they want to ensure that the map will never
need to rehash.  It might be helpful to specify in Javadoc how to
achieve that goal.

Alex
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Dhanji R. Prasanna
In reply to this post by Doug Lea
I think this is a reasonable position for loadFactors > default. But perhaps not lower.

Dhanji.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Alex Miller-10
Thanks as always for the on- and off-list feedback.
It seems worthwhile to simplify use and documentation of
load factors, so the current update does so.


On 09/08/11 12:10, Alex Miller wrote:
> I think the only time I have ever seen someone use an alternate load
> factor is in the case where they are creating a Map that has a known
> maximum fixed size and they want to ensure that the map will never
> need to rehash.  It might be helpful to specify in Javadoc how to
> achieve that goal.
>

Yes, thanks. The answer should be utterly obvious:
Use the constructor with this fixed size as the initialCapacity.

Unfortunately, the existing j.u.c specs and javadocs made this
confusing; and worse, not absolutely guaranteed to be true.
Some of the confusion was due to the unintended
implication that the default constructor is sized to hold 16
elements, when instead it is just 16 period.
In practice, most of the slop doesn't matter at all.
With power of two tables, there are only 30 possible
table sizes, so calculations based on misunderstandings
about the meaning of constructor parameters usually end up
to be what people expect anyway. But still it should be,
and now is, straightened out.

> On 09/08/11 18:16, [hidden email] wrote:
>> I think this is a reasonable position for loadFactors > default. But perhaps not
>> lower.
>>

The adjustment works both ways, but only for initial
sizing, which is the only aspect that is fully controllable
in a concurrent table. So if you'd like an extremely sparsely
populated table for say 1000 elements, you can use
   new ConcurrentHashMap(1000, 0.25f);
The only difference from existing j.u.c version is that if you
don't have a good size estimate, and want it sparsely
populated anyway, we can't help you. But we could never exactly
promise this anyway; among other reasons because sizes may fluctuate
while resizing is in progress.

So little actual control is lost, and we gain a more understandable
characterization in the revised constructor and class-level javadocs.
The class-level now includes just barely enough background on
hashing to clearly explain what's up without making people plod through
Too Much Information. I hope. See:
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166edocs/jsr166e/ConcurrentHashMapV8.html

-Doug






_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Jason T. Greene
In reply to this post by Doug Lea
On 9/8/11 6:45 AM, Doug Lea wrote:

>
> Among some further ongoing improvements and extensions (mainly
> to support parallel aggregate operations), I'm contemplating
> (nearly) ignoring "loadFactor" constructor arguments for
> ConcurrentHashMap (in "V8" form for now, but ultimately
> targeting revised j.u.c version). The main motivation
> is that in concurrent hash maps (of any form) collisions
> and contention are interdependent, so anyone using a
> higher-than-default load factor because they think it will
> save space is also unexpectedly getting poorer multithreaded
> performance. Given how hard it was to give clear answers to
> questions on this list about exactly what blocks when based on
> probability distributions with a parameter based on load factor,
> I'm now thinking that is best to internally always use
> the default, and use the optional constructor parameter
> only as an initial sizing hint (larger loadFactor ->
> smaller initial capacity). This will enable simpler
> characterization of expected performance.
>
> Scanning for uses of ConcurrentHashMap constructors
> on google code search, it seems that hardly anyone ever
> overrides the default anyway, so this probably doesn't
> matter much either way. But if you have ever done so,
> and have a good reason, could you let me know? Thanks!

What about just limiting the upper range? I could see someone dropping
the default load factor to get marginally better throughput.

Although, IMO the only must-have scenario for exposing a load factor is
if you have an open addressed map and the factor heavily affects access
time.

--
Jason T. Greene
JBoss AS Lead / EAP Platform Architect
JBoss, a division of Red Hat
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea

An update to jsr166e.ConcurrentHashMapV8 includes two
suggested changes that can affect usage:

* The "compute" method now supports read-modify-write
by using a function that provides the previous value.

* Both computeIfAbsent and compute now throw NullPointerException
(leaving old mapping or lack thereof intact) if the function returns
null.

Plus some other improvements.

While I'm at it...

On 09/30/11 12:58, Jason T. Greene wrote:
>>
>> [... overriding default load factor  ... ]
>
> What about just limiting the upper range? I could see someone dropping the
> default load factor to get marginally better throughput.

Although even in that case, the overrides were previously
not handled consistently, and by allowing overridden load
factor to affect initial sizing, we still support it about
as well as before, but can now clearly specify the effects.
The lack of clarity arose from a series of inconsistent
decisions (some of them mine) to more-or-less mimic
the already-inconsistent policies of the original HashTable
class. Unclear enough that if someone wanted to backport
the CHMV8 code but not the clarified specs to previous JDKs,
I don't think any user could tell the difference in the
effects of overriding load factor.

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang
Hi Doug,

I'd love to be able to specify a Comparator in the constructor so that conditional remove doesn't rely on reference-equality or .equals (since oftentimes I cannot overload it)

Thoughts?

Cheers,


On Mon, Oct 3, 2011 at 12:29 AM, Doug Lea <[hidden email]> wrote:

An update to jsr166e.ConcurrentHashMapV8 includes two
suggested changes that can affect usage:

* The "compute" method now supports read-modify-write
by using a function that provides the previous value.

* Both computeIfAbsent and compute now throw NullPointerException
(leaving old mapping or lack thereof intact) if the function returns
null.

Plus some other improvements.

While I'm at it...

On 09/30/11 12:58, Jason T. Greene wrote:

[... overriding default load factor  ... ]

What about just limiting the upper range? I could see someone dropping the
default load factor to get marginally better throughput.

Although even in that case, the overrides were previously
not handled consistently, and by allowing overridden load
factor to affect initial sizing, we still support it about
as well as before, but can now clearly specify the effects.
The lack of clarity arose from a series of inconsistent
decisions (some of them mine) to more-or-less mimic
the already-inconsistent policies of the original HashTable
class. Unclear enough that if someone wanted to backport
the CHMV8 code but not the clarified specs to previous JDKs,
I don't think any user could tell the difference in the
effects of overriding load factor.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
On 10/03/11 09:34, √iktor Ҡlang wrote:

> I'd love to be able to specify a Comparator in the constructor so that
> conditional remove doesn't rely on reference-equality or .equals (since
> oftentimes I cannot overload it)

One reasonable way to do this interacts with a decision
I keep flip-flopping on because people make good cases for
all three of the possibilities: What should the map do when
either a mapper or remapper function returns null?
The possibilities are:
   1. treat it as a user error, so throw NullPointerException
   2. treat it as a (re)mapping failure, so ignore the outcome
   3. treat it as a removal

I originally did (2), but arguments that (1) was a safer
option led me to change to it in latest update. But yours
is another of several use cases that make (3) more
attractive -- on the plus side, it allows atomic removal
(that you could use in the above case).
But on the minus side it might lead to unintentional
atomic removal.

Further arguments welcome.

-Doug



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


2011/10/3 Doug Lea <[hidden email]>
On 10/03/11 09:34, √iktor Ҡlang wrote:

I'd love to be able to specify a Comparator in the constructor so that
conditional remove doesn't rely on reference-equality or .equals (since
oftentimes I cannot overload it)

One reasonable way to do this interacts with a decision
I keep flip-flopping on because people make good cases for
all three of the possibilities: What should the map do when
either a mapper or remapper function returns null?

TBH I think they need to be symmetrical to put(key, null), no?
 
The possibilities are:
 1. treat it as a user error, so throw NullPointerException
 2. treat it as a (re)mapping failure, so ignore the outcome
 3. treat it as a removal

I originally did (2), but arguments that (1) was a safer
option led me to change to it in latest update. But yours
is another of several use cases that make (3) more
attractive -- on the plus side, it allows atomic removal
(that you could use in the above case).
But on the minus side it might lead to unintentional
atomic removal.

Further arguments welcome.

-Doug





--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
On 10/03/11 10:07, √iktor Ҡlang wrote:
> TBH I think they need to be symmetrical to put(key, null), no?

In a library that was not constrained by 15 years of history
and conventions, map.put(key, null) would also most likely mean
to remove mapping. We can't do that though and still obey
java.util.Map spec. But the details of new methods are still
up for grabs.

>
>     The possibilities are:
>       1. treat it as a user error, so throw NullPointerException
>       2. treat it as a (re)mapping failure, so ignore the outcome
>       3. treat it as a removal
>
>     I originally did (2), but arguments that (1) was a safer
>     option led me to change to it in latest update. But yours
>     is another of several use cases that make (3) more
>     attractive -- on the plus side, it allows atomic removal
>     (that you could use in the above case).
>     But on the minus side it might lead to unintentional
>     atomic removal.


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
1234