ConcurrentHashMapV8

classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Mark Thornton-3
On 03/10/11 14:34, √iktor Ҡlang wrote:

> Hi Doug,
>
> I'd love to be able to specify a Comparator in the constructor so that
> conditional remove doesn't rely on reference-equality or .equals
> (since oftentimes I cannot overload it)
>
> Thoughts?
>
> Cheers,
> √
A Comparator wouldn't be appropriate for a HashMap. What you need is
something like:

interface Equivalence<T> {
     int hash(T value);
     boolean isEqual(T a, T b);
}

Mark

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


On Mon, Oct 3, 2011 at 4:22 PM, Mark Thornton <[hidden email]> wrote:
On 03/10/11 14:34, √iktor Ҡlang wrote:
Hi Doug,

I'd love to be able to specify a Comparator in the constructor so that conditional remove doesn't rely on reference-equality or .equals (since oftentimes I cannot overload it)

Thoughts?

Cheers,

A Comparator wouldn't be appropriate for a HashMap.

Appropriate? You mean just like doing a DNS lookup in the equals of java.net.URL?
 
What you need is something like:

interface Equivalence<T> {
   int hash(T value);
   boolean isEqual(T a, T b);
}

What does hashes have to do with equivalence? Is that appropriate?

interface Hashable<T> {
  int computeHashFor(T value);
}

interface Equivalence<T> {
  boolean isEqual(T a, T b);
}

and then require something that implements both Hashable<K> and Equivalence<V>

Cheers,

 

Mark


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Mark Thornton-3
On 03/10/11 15:59, √iktor Ҡlang wrote:


A Comparator wouldn't be appropriate for a HashMap.


Appropriate? You mean just like doing a DNS lookup in the equals of java.net.URL?

A Comparator requires a domain with an ordering whereas HashMap's can be used on unordered domains.
 
What you need is something like:

interface Equivalence<T> {
   int hash(T value);
   boolean isEqual(T a, T b);
}

What does hashes have to do with equivalence? Is that appropriate?

interface Hashable<T> {
  int computeHashFor(T value);
}

interface Equivalence<T> {
  boolean isEqual(T a, T b);
}

and then require something that implements both Hashable<K> and Equivalence<V>


I think Hashable might extend Equivalence as this sort of hash is tied up with equivalence.

Mark


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


2011/10/3 Mark Thornton <[hidden email]>
On 03/10/11 15:59, √iktor Ҡlang wrote:


A Comparator wouldn't be appropriate for a HashMap.


Appropriate? You mean just like doing a DNS lookup in the equals of java.net.URL?

A Comparator requires a domain with an ordering whereas HashMap's can be used on unordered domains.

While I agree with your stance on this in general, the contract of Comparator does not require the parameterized type to have any total ordering. For the subset of functionality needed in the Comparator (only 0 results are interesting), I'd prefer that over introducing yet another Noun that is not backported into the main API. (Closable is still not implemented everywhere)

Now, what we'd really want here is a function from T => T => Boolean, or uncurried (T,T) => Boolean so we didn't have to come up with new names all the time. A man can dream, I suppose.
 

 
What you need is something like:

interface Equivalence<T> {
   int hash(T value);
   boolean isEqual(T a, T b);
}

What does hashes have to do with equivalence? Is that appropriate?

interface Hashable<T> {
  int computeHashFor(T value);
}

interface Equivalence<T> {
  boolean isEqual(T a, T b);
}

and then require something that implements both Hashable<K> and Equivalence<V>


I think Hashable might extend Equivalence as this sort of hash is tied up with equivalence.

The tie is the parameterized type signature of ? extends Hashable<K> & Equivalence<V>,
smashing them together would really limit the usefulness and doesn't add any value.

Cheers,

 

Mark




--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
On 03-10-2011 17:18, √iktor Ҡlang wrote:
> The tie is the parameterized type signature of ? extends Hashable<K> &
> Equivalence<V>,
> smashing them together would really limit the usefulness and doesn't add
> any value.

I really do not see practically reason for separating these into 2
different interfaces. Only more clutter.

First of all, 90 % of all usage will be either Equivalence.Equals or
Equivalence.Identity. With probably Equivalence.ignoreStringCase as a
trailing third. These can all be predefined so users do not need to
implement anything.

Second, I think almost all usage will be in situations where you
will also need the custom hash, for example, for keys in a hash map.

See also
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/extra166y/CustomConcurrentHashMap.java 
which supports user-defined equivalence comparisons.

Guava has an identical interface com.google.common.base.Equivalence.

cheers
   Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


On Mon, Oct 3, 2011 at 5:47 PM, Kasper Nielsen <[hidden email]> wrote:
On 03-10-2011 17:18, √iktor Ҡlang wrote:
The tie is the parameterized type signature of ? extends Hashable<K> &
Equivalence<V>,
smashing them together would really limit the usefulness and doesn't add
any value.

I really do not see practically reason for separating these into 2 different interfaces. Only more clutter.

You're a firm disbeliever is the Single Responsibility Principle?

The computation of the hashCode of the Key is distinctly something else from comparing equality of Values.
 

First of all, 90 % of all usage will be either Equivalence.Equals or Equivalence.Identity. With probably Equivalence.ignoreStringCase as a trailing third. These can all be predefined so users do not need to implement anything.

Second, I think almost all usage will be in situations where you
will also need the custom hash, for example, for keys in a hash map.

What? As I said, how does the equivalence-checking of the Values interact with the hash of the Keys?
 

See also
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/extra166y/CustomConcurrentHashMap.java which supports user-defined equivalence comparisons.

That is about the equivalence of the Keys, not of the Values.
 

Guava has an identical interface com.google.common.base.Equivalence.

I'd be surprised if there aren't thousands of similar implementations all over the Java ecosystem due to the lack of it in the standard library.

Cheers,

 

cheers
 Kasper

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
On 03-10-2011 18:04, √iktor Ҡlang wrote:
>
>     I really do not see practically reason for separating these into 2
>     different interfaces. Only more clutter.
>
>
> You're a firm disbeliever is the Single Responsibility Principle?

No, I am a firm believer in pragmatic design and keeping things simple.
Adding 1 interface is better then adding 2.

Look, you probably have a pretty specific use case for why you cannot
use the equals contract when you remove an element. And you have no need
for a custom hash code.

But I think for most other people there is no need to separate the
responsibility of calculating a hash code and answering whether two
object are considered equivalent. Because you need both most of the time.

One of the first things you learn when working with Java is always
override hashCode() if you override equals(). I think it is only natural
that any interface defining custom policies with regards to equals() or
hashCode() supports that notion.

I mean forcing you and a handful other people to also implement
Equivalence.hashCode() method is not going to ruin anyone's day.

>     See also
>     http://gee.cs.oswego.edu/cgi-__bin/viewcvs.cgi/jsr166/src/__extra166y/__CustomConcurrentHashMap.java
>     which supports user-defined equivalence comparisons.
> That is about the equivalence of the Keys, not of the Values.

If you look closer it also support value equivalence.

cheers
   Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
In reply to this post by Doug Lea
On 03-10-2011 15:52, Doug Lea wrote:

> On 10/03/11 09:34, √iktor Ҡlang wrote:
>
>> I'd love to be able to specify a Comparator in the constructor so that
>> conditional remove doesn't rely on reference-equality or .equals (since
>> oftentimes I cannot overload it)
>
> One reasonable way to do this interacts with a decision
> I keep flip-flopping on because people make good cases for
> all three of the possibilities: What should the map do when
> either a mapper or remapper function returns null?
> The possibilities are:
> 1. treat it as a user error, so throw NullPointerException
> 2. treat it as a (re)mapping failure, so ignore the outcome
> 3. treat it as a removal

I had a similar discussion with myself a while ago for a library I was
designing. I ended up with 2. (ignoring the outcome) for a couple of reasons

1)
I allow users to recompute all mappings for a map. A.la
computeAll(BinaryMapper<? super K, ? super V, ? extends V> remapper)
which will run through any entry in the map and optionally compute a new
value. Since I rarely need to update all values at the same time it is
common I do not want to do anything with the value.

Although volatile writes are not that expensive. Having to update every
mapping is still some task for large maps.

Of course, one way to do avoid this would be to see if the existing
value is identical to the recomputed value. In which case the mapping
does not need to be updated.

2)
In many situations there is a difference between a no-op and updating an
entry in place.

If you working with something such as the event handlers defined in the
upcoming JSR 107. You only want to notify these event handlers in case
a value changes.

Doing a simple check on identify by a map/cache as I suggested in 1). Is
often not enough since users might not create a new value in the compute
method, but instead update the value in place. For example, changing
some bits in a blob.

Likewise, if you are working in a distributed setting you only want to
send new values to other peers if they actually change.

3)
Although it is difficult to generalize. In case a Remapper returned a
null as the result of a user error. I had the feeling a no-op is
preferred instead of deleting the entry from the map.

4)
I wanted both computeIfPresent and computeIfAbsent to behave identical
with respect to null being returned by a mapper/remapper. So either it
was a user error (1) or it was a no-op (2). Which left 3. out.

Even if there are only few use case for it, returning null from
computeIfAbsent will result in no mapping being established.

5)
Since Remapper/Mapper code is invoked within atomicity control, it is
usually short and simple code. So few errors arise. At least I have yet
to come across one resulting from returning null as a mistake.

6)
In general, I don't like null representing anything but a mistake. But
in my case I had some use cases where the benefits of 2. (ignoring the
outcome) outweighed my reluctance towards using nulls.

cheers
   Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang
In reply to this post by Kasper Nielsen-3


On Mon, Oct 3, 2011 at 6:36 PM, Kasper Nielsen <[hidden email]> wrote:
On 03-10-2011 18:04, √iktor Ҡlang wrote:

   I really do not see practically reason for separating these into 2
   different interfaces. Only more clutter.


You're a firm disbeliever is the Single Responsibility Principle?

No, I am a firm believer in pragmatic design and keeping things simple.

I'm also a fan of pragmatic design, but not when it comes to standard libraries, simply because you cannot revert later. Pragmatism thrives in dynamic environments, a standard lib is usually anything but dynamic.

java.util.Date was also a rather pragmatic approach.
 
Adding 1 interface is better then adding 2.

According to what definition of "better"? To be honest, if we'd have more fine grained interfaces in Java, we would probably not be in this situation we are in today, where the Collection interface is so broad you oftentimes have to choose between performance or add UnsupportedOperationExceptions.
If we'd have more fine-grained interfaces (or typeclasses) we could be more selective in what data gets exposed.
 

Look, you probably have a pretty specific use case for why you cannot use the equals contract when you remove an element.

In my world equality is contextual, having equals on java.lang.Object removes the possibility of contexts. Universal equality is one of those things where there jury is still out.
 
And you have no need for a custom hash code.

And so I should not have to tinker with that. So the simplest 2 solutions are:

1) Separate the things into Hashable and Equivalence, if it is deemed that euqality is always interesting when dealing with hashes, then Hashable should extend Equivalence. All is fine.

2) Atleast make it so that it doesn't create pointless boilerplate and make Equivalence an abstract class (due to the lack of mixins in Java) and have the default implementation of hash return a.hashCode()



But I think for most other people there is no need to separate the responsibility of calculating a hash code and answering whether two object are considered equivalent.

I'd be more than surprised to see hashCode being called at all places where equality is checked. Atleast in Java. The other way around is probably true tho, that when you need Hashability, you need Equality, this ties into what I've said previously in this email.
 
Because you need both most of the time.

I do not believe that. Equality as a concept I believe is far more used than hashability. If you do not believe me we can have a small contest to see who finds most usage of either equality alone or equality and hashability in a randomly chosen Java project.
 

One of the first things you learn when working with Java is always override hashCode() if you override equals(). I think it is only natural that any interface defining custom policies with regards to equals() or hashCode() supports that notion.

And since the majority of Java classes have mutable state, the majority of them is broken.
 

I mean forcing you and a handful other people to also implement Equivalence.hashCode() method is not going to ruin anyone's day.

Boilerplate ruins my day, I think it's rude to the users.
 

   See also
   http://gee.cs.oswego.edu/cgi-__bin/viewcvs.cgi/jsr166/src/__extra166y/__CustomConcurrentHashMap.java

   which supports user-defined equivalence comparisons.
That is about the equivalence of the Keys, not of the Values.

If you look closer it also support value equivalence.

Indeed! See my answers above.

Cheers,




cheers
 Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Kasper Nielsen-3
On 03-10-2011 19:20, √iktor Ҡlang wrote:

>     Adding 1 interface is better then adding 2.
>
>
> According to what definition of "better"? To be honest, if we'd have
> more fine grained interfaces in Java, we would probably not be in this
> situation we are in today, where the Collection interface is so broad
> you oftentimes have to choose between performance or add
> UnsupportedOperationExceptions.
> If we'd have more fine-grained interfaces (or typeclasses) we could be
> more selective in what data gets exposed.

According to the design goals of the Java collection framework.
http://download.oracle.com/javase/1.4.2/docs/guide/collections/overview.html,
1 interface is "better" than 2.

I know things are probably different in Scala land. But i do think the
designers of the Java collection API made the right choices with regards
to this. Is it perfect? no. But I would rather have 10 interfaces with
optional operations not capturing distinctions such as modifiability
than 20 - 30 interfaces that did. Others might disagree, but I really do
not think it was a mistake as you are implying.

Do it places a burden on implementors, yes. But for every time there is
one person implementing the Collection interface there are probably
thousands that are merely users of that interface.

I really think too many API designers think too much on taxonomy and to
little of the "conceptual weight" of the APIs they design.

Hey! there are users that need to figure out how that APIs work. Whether
there are 10 interfaces or 20 - 30 interfaces matters to them.

Anyway, this is probably not the right forum do discuss this. And we are
probably not going to come to any kind of agreement on this anyway.

cheers
   Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Viktor Klang


On Mon, Oct 3, 2011 at 8:07 PM, Kasper Nielsen <[hidden email]> wrote:
On 03-10-2011 19:20, √iktor Ҡlang wrote:
   Adding 1 interface is better then adding 2.


According to what definition of "better"? To be honest, if we'd have
more fine grained interfaces in Java, we would probably not be in this
situation we are in today, where the Collection interface is so broad
you oftentimes have to choose between performance or add
UnsupportedOperationExceptions.
If we'd have more fine-grained interfaces (or typeclasses) we could be
more selective in what data gets exposed.


Mate, you've completely sidetracked here, would have much appreciated if we could stay on topic, I believe I made several points that you chose to skip in your reply.
 
According to the design goals of the Java collection framework.
http://download.oracle.com/javase/1.4.2/docs/guide/collections/overview.html, 1 interface is "better" than 2.

I know things are probably different in Scala land. But i do think the designers of the Java collection API made the right choices with regards to this. Is it perfect? no. But I would rather have 10 interfaces with optional operations not capturing distinctions such as modifiability than 20 - 30 interfaces that did. Others might disagree, but I really do not think it was a mistake as you are implying.

II used to have the exact same opinion, then I found something better. We should never be afraid of questioning the status quo out of fear of change.
 

Do it places a burden on implementors, yes. But for every time there is one person implementing the Collection interface there are probably thousands that are merely users of that interface.

I think it's beneficial to let the compiler enforce contracts than having to surrender to runtime exceptions.
 

I really think too many API designers think too much on taxonomy and to little of the "conceptual weight" of the APIs they design.

Those two things are completely orthogonal so there's no reason not to strive for both.
 

Hey! there are users that need to figure out how that APIs work. Whether there are 10 interfaces or 20 - 30 interfaces matters to them.

The number of interfaces isn't nearly as interesting as the number of methods. I'd rather have a 1000 interfaces with 10 methods than 10000 methods in 1 interface.
 

Anyway, this is probably not the right forum do discuss this. And we are probably not going to come to any kind of agreement on this anyway.

I think that's a bit of a shame, since I opened the door by considering to add hash as an optional override in Equivalence.

Cheers,

 


cheers
 Kasper
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by Doug Lea
I'm still contemplating changes in handling null-returns
in the new compute methods. But in the mean time, here
is another tiny corner-case issue that is handled differently
in ConcurrentHashMapV8 than in the existing j.u.c version.
I consider it just a misfeature-fix, but Joe Bowbeer remains
nervous about it, so urged me to post and solicit reaction.

This takes a fair amount of context to set up:

The Set<Map.Entry> you get from map.entrySet() has always
had some problematic specs, mainly because they are
based on those of Map.Entry. When you have a Map.Entry,
you don't know whether it is mutable vs immutable
because setValue is an optional operation. And in some
cases, you don't whether setValue actually updates
the map you got the entry from or just updates an
unbound Map.Entry object.

While it would be nice to better specify some of this,
the current Map-related specs are phrased in such a way
that implementations can normally do what users expect.
The main intent of setValue is to support only
one use case: iterating through entries one-by-one
and changing some of the mapped values in the absence
of any modifications by other threads. ConcurrentHashMap
(both the existing and V8 versions) supports this use case
in the expected way -- which of course can have surprising
effects if other threads ARE modifying the map while
you are doing this. But we don't go so far as
to disable the method because of this.

However, the write-through nature of setValue
(i.e., to actually change the mapped value in
the map) conflicts with the intent of the
entrySet.toArray() method. Even though the
Collection.toArray() spec doesn't explicitly
say so, users expect it to return an independent
copy of the elements, in array form. For example,
you would be surprised and unhappy if you did
   Object[] a = myArrayList.toArray();
   a[0] = 17;
and then found out that this caused myArrayList.get(0)
to also now return 17.

In the case of ConcurrentHashMap, this means that the
kind of Entry you get for the elements of entrySet.toArray
should NOT write back to the map you got them from.
In other words:
   Object[] a = myMap.entrySet().toArray();
   (Entry[])a[0].setValue(17);
should not change myMap.get(keyFor0) to return 17.

However, the existing j.u.c version does so. This was
changed to the safer non-write-through form for V8.
The original/existing behavior was mainly just due to
oversight -- it would have been different to begin with
if anyone had noticed this issue of establishing a
persistent aliasing path (unlike the transient path
available in one-by-one iteration).

In general the interactions of loose specs for both
Map.Entry and Collection.toArray mean that you can't
count on very much here, and it turns out that different
Maps do different things. For ConcurrentHashMap, we'd
like it to do the most sensible thing, at least in
future versions. However, because the specs are loose,
we cannot claim that this is a bug-fix; it is just
a mis-feature fix.

To me, fixing the mis-feature is more important
than retaining pure compatibility for a usage
that probably didn't do what people expected anyway.
But if you've ever done anything relying on the old j.u.c
behavior, could you let us know? Thanks!

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

studdugie
+1 Fix it

Dane


On Wed, Oct 5, 2011 at 9:11 AM, Doug Lea <[hidden email]> wrote:
I'm still contemplating changes in handling null-returns
in the new compute methods. But in the mean time, here
is another tiny corner-case issue that is handled differently
in ConcurrentHashMapV8 than in the existing j.u.c version.
I consider it just a misfeature-fix, but Joe Bowbeer remains
nervous about it, so urged me to post and solicit reaction.

This takes a fair amount of context to set up:

The Set<Map.Entry> you get from map.entrySet() has always
had some problematic specs, mainly because they are
based on those of Map.Entry. When you have a Map.Entry,
you don't know whether it is mutable vs immutable
because setValue is an optional operation. And in some
cases, you don't whether setValue actually updates
the map you got the entry from or just updates an
unbound Map.Entry object.

While it would be nice to better specify some of this,
the current Map-related specs are phrased in such a way
that implementations can normally do what users expect.
The main intent of setValue is to support only
one use case: iterating through entries one-by-one
and changing some of the mapped values in the absence
of any modifications by other threads. ConcurrentHashMap
(both the existing and V8 versions) supports this use case
in the expected way -- which of course can have surprising
effects if other threads ARE modifying the map while
you are doing this. But we don't go so far as
to disable the method because of this.

However, the write-through nature of setValue
(i.e., to actually change the mapped value in
the map) conflicts with the intent of the
entrySet.toArray() method. Even though the
Collection.toArray() spec doesn't explicitly
say so, users expect it to return an independent
copy of the elements, in array form. For example,
you would be surprised and unhappy if you did
 Object[] a = myArrayList.toArray();
 a[0] = 17;
and then found out that this caused myArrayList.get(0)
to also now return 17.

In the case of ConcurrentHashMap, this means that the
kind of Entry you get for the elements of entrySet.toArray
should NOT write back to the map you got them from.
In other words:
 Object[] a = myMap.entrySet().toArray();
 (Entry[])a[0].setValue(17);
should not change myMap.get(keyFor0) to return 17.

However, the existing j.u.c version does so. This was
changed to the safer non-write-through form for V8.
The original/existing behavior was mainly just due to
oversight -- it would have been different to begin with
if anyone had noticed this issue of establishing a
persistent aliasing path (unlike the transient path
available in one-by-one iteration).

In general the interactions of loose specs for both
Map.Entry and Collection.toArray mean that you can't
count on very much here, and it turns out that different
Maps do different things. For ConcurrentHashMap, we'd
like it to do the most sensible thing, at least in
future versions. However, because the specs are loose,
we cannot claim that this is a bug-fix; it is just
a mis-feature fix.

To me, fixing the mis-feature is more important
than retaining pure compatibility for a usage
that probably didn't do what people expected anyway.
But if you've ever done anything relying on the old j.u.c
behavior, could you let us know? Thanks!

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

David M. Lloyd-3
In reply to this post by Doug Lea
On 10/5/11 6:11 AM, Doug Lea wrote:

> I'm still contemplating changes in handling null-returns
> in the new compute methods. But in the mean time, here
> is another tiny corner-case issue that is handled differently
> in ConcurrentHashMapV8 than in the existing j.u.c version.
> I consider it just a misfeature-fix, but Joe Bowbeer remains
> nervous about it, so urged me to post and solicit reaction.
>
> This takes a fair amount of context to set up:
> [...snip...]
> For example,
> you would be surprised and unhappy if you did
> Object[] a = myArrayList.toArray();
> a[0] = 17;
> and then found out that this caused myArrayList.get(0)
> to also now return 17.
>
> In the case of ConcurrentHashMap, this means that the
> kind of Entry you get for the elements of entrySet.toArray
> should NOT write back to the map you got them from.
> In other words:
> Object[] a = myMap.entrySet().toArray();
> (Entry[])a[0].setValue(17);
> should not change myMap.get(keyFor0) to return 17.

I disagree with this conclusion; these scenarios aren't really the same.
  The general contract of toArray() is to return an array consisting of
all the values in the collection - I think interpreting it such that it
can return copies of the values is a stretch.  I can't think of any case
where we'd actually copy the values in normal collections; I don't see
why an entry set should be any different, magical write-through
notwithstanding.

For example if I did:

Object[] a = myMap.entrySet().toArray();

I would expect the same result as if I did:

Object[] a = new ArrayList<Entry<Blah,Blah>>(myMap.entrySet()).toArray();

I don't think toArray should be special here.  I think this is the "most
correct" interpretation of this requirement, and in any case is the one
I've adhered to whenever developing collection implementations.  I think
that there should be no difference between the values in a toArray()
result and the values you'd get back from a set iterator.
--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Joe Bowbeer
On Wed, Oct 5, 2011 at 3:31 PM, David M. Lloyd wrote:
On 10/5/11 6:11 AM, Doug Lea wrote:
I'm still contemplating changes in handling null-returns
in the new compute methods. But in the mean time, here
is another tiny corner-case issue that is handled differently
in ConcurrentHashMapV8 than in the existing j.u.c version.
I consider it just a misfeature-fix, but Joe Bowbeer remains
nervous about it, so urged me to post and solicit reaction.

This takes a fair amount of context to set up:
[...snip...]

For example,
you would be surprised and unhappy if you did
Object[] a = myArrayList.toArray();
a[0] = 17;
and then found out that this caused myArrayList.get(0)
to also now return 17.

In the case of ConcurrentHashMap, this means that the
kind of Entry you get for the elements of entrySet.toArray
should NOT write back to the map you got them from.
In other words:
Object[] a = myMap.entrySet().toArray();
(Entry[])a[0].setValue(17);
should not change myMap.get(keyFor0) to return 17.

I disagree with this conclusion; these scenarios aren't really the same.  The general contract of toArray() is to return an array consisting of all the values in the collection - I think interpreting it such that it can return copies of the values is a stretch.  I can't think of any case where we'd actually copy the values in normal collections; I don't see why an entry set should be any different, magical write-through notwithstanding.

For example if I did:

Object[] a = myMap.entrySet().toArray();

I would expect the same result as if I did:

Object[] a = new ArrayList<Entry<Blah,Blah>>(myMap.entrySet()).toArray();

I don't think toArray should be special here.  I think this is the "most correct" interpretation of this requirement, and in any case is the one I've adhered to whenever developing collection implementations.  I think that there should be no difference between the values in a toArray() result and the values you'd get back from a set iterator.
--
- DML

I agree with your interpretation and expectation regarding toArray.  But the Map.Entry documentation (below) limits their validity to the duration of the iteration.  How do you reconcile the two?


"The *only* way to obtain a reference to a map entry is from the iterator of this [entry-set] collection-view. These Map.Entry objects are valid *only* for the duration of the iteration; more formally, the behavior of a map entry is undefined if the backing map has been modified after the entry was returned by the iterator, except through the setValue operation on the map entry."

--Joe

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

David M. Lloyd-3
On 10/5/11 4:12 PM, Joe Bowbeer wrote:

> On Wed, Oct 5, 2011 at 3:31 PM, David M. Lloyd wrote:
>
>     On 10/5/11 6:11 AM, Doug Lea wrote:
>
>         I'm still contemplating changes in handling null-returns
>         in the new compute methods. But in the mean time, here
>         is another tiny corner-case issue that is handled differently
>         in ConcurrentHashMapV8 than in the existing j.u.c version.
>         I consider it just a misfeature-fix, but Joe Bowbeer remains
>         nervous about it, so urged me to post and solicit reaction.
>
>         This takes a fair amount of context to set up:
>         [...snip...]
>
>         For example,
>         you would be surprised and unhappy if you did
>         Object[] a = myArrayList.toArray();
>         a[0] = 17;
>         and then found out that this caused myArrayList.get(0)
>         to also now return 17.
>
>         In the case of ConcurrentHashMap, this means that the
>         kind of Entry you get for the elements of entrySet.toArray
>         should NOT write back to the map you got them from.
>         In other words:
>         Object[] a = myMap.entrySet().toArray();
>         (Entry[])a[0].setValue(17);
>         should not change myMap.get(keyFor0) to return 17.
>
>
>     I disagree with this conclusion; these scenarios aren't really the
>     same.  The general contract of toArray() is to return an array
>     consisting of all the values in the collection - I think
>     interpreting it such that it can return copies of the values is a
>     stretch.  I can't think of any case where we'd actually copy the
>     values in normal collections; I don't see why an entry set should be
>     any different, magical write-through notwithstanding.
>
>     For example if I did:
>
>     Object[] a = myMap.entrySet().toArray();
>
>     I would expect the same result as if I did:
>
>     Object[] a = new
>     ArrayList<Entry<Blah,Blah>>(__myMap.entrySet()).toArray();
>
>     I don't think toArray should be special here.  I think this is the
>     "most correct" interpretation of this requirement, and in any case
>     is the one I've adhered to whenever developing collection
>     implementations.  I think that there should be no difference between
>     the values in a toArray() result and the values you'd get back from
>     a set iterator.
>     --
>     - DML
>
>
> I agree with your interpretation and expectation regarding toArray.  But
> the Map.Entry documentation (below) limits their validity to the
> duration of the iteration.  How do you reconcile the two?
>
> http://download.oracle.com/javase/6/docs/api/java/util/Map.Entry.html
>
> "The *only* way to obtain a reference to a map entry is from the
> iterator of this [entry-set] collection-view. These Map.Entry objects
> are valid *only* for the duration of the iteration; more formally, the
> behavior of a map entry is undefined if the backing map has been
> modified after the entry was returned by the iterator, except through
> the setValue operation on the map entry."

Well be cautious how you define "the duration".  I think that the real
definition as given by the "more formally" section relates to modifying
the map after an iterator is created.  And I really interpret this as
meaning that the entry *must* be valid before such a modification, and
*may* be valid afterward.  For an implementer this means you are allowed
to keep them valid for as long as is reasonable or convenient.  From a
user perspective it means that you'd better not rely on it unless you
know what implementation you're dealing with and what guarantees it
makes, above and beyond what is specified.

But for ConcurrentMap in particular, this restriction needs a rethink
given that the whole _point_ of the thing is to provide concurrent
access; entry iterators become a lot less useful if you must freeze
write access to the whole map while you iterate.  I think it might be a
good idea to override entrySet() for the purpose of adding documentation
to clarify this.

In any case, I think you can say that "during iteration" is equal to the
time in which you have any active references to the items returned by
the iterator, because no other definition makes sense.  So if toArray()
is defined in terms of iteration then there is no conflict - using the
array is the same as using the values returned by the iterator as they
are returned; the iteration is seen to be in progress until the last
entry is "discarded".  I don't see anything in the docs that contradict
this.


--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Bob Lee-8
In reply to this post by David M. Lloyd-3
One more thing to consider: using entrySet.toArray() involves creating an array of a generic type (or actually casting an Entry[] to Entry<K, V>[]). That's probably not something we want to encourage or go out of our way to enable.

Bob


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

David Holmes-6
In reply to this post by David M. Lloyd-3
Quoting "David M. Lloyd" <[hidden email]>:
> In any case, I think you can say that "during iteration" is equal to
> the time in which you have any active references to the items returned
> by the iterator, because no other definition makes sense.

On the contrary, "during the iteration" means to me that the iterator  
is still active and has not yet reached the end of the collection.  
Once we reach the end the iteration is over, and so an array created  
by iteration contains invalid map.entry objects the instant the  
filling of the array completes.

Though what exactly it means to have an "invalid" map.entry object is  
itself undefined.

The spec for entrySet is poorly written and the interaction with  
toArray is not clearly defined. Given that we can argue this either way.

But because we can argue it either way I'm inclined to say leave the  
existing code as-is. Neither the existing code nor the new CHMV8 code  
is "wrong" per-se.

David Holmes
------------



> So if
> toArray() is defined in terms of iteration then there is no conflict -
> using the array is the same as using the values returned by the
> iterator as they are returned; the iteration is seen to be in progress
> until the last entry is "discarded".  I don't see anything in the docs
> that contradict this.
>
>
> --
> - DML
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

Doug Lea
In reply to this post by David M. Lloyd-3
On 10/05/11 18:31, David M. Lloyd wrote:

>> In the case of ConcurrentHashMap, this means that the
>> kind of Entry you get for the elements of entrySet.toArray
>> should NOT write back to the map you got them from.
>> In other words:
>> Object[] a = myMap.entrySet().toArray();
>> (Entry[])a[0].setValue(17);
>> should not change myMap.get(keyFor0) to return 17.
>
> I disagree with this conclusion; these scenarios aren't really the same. The
> general contract of toArray() is to return an array consisting of all the values
> in the collection - I think interpreting it such that it can return copies of
> the values is a stretch. I can't think of any case where we'd actually copy the
> values in normal collections;

Never deep copies. But always shallow copies. For example

Object[] a = myMap.values().toArray();
a[0] = 17;
also does not change myMap.get(keyFor0) to return 17 in
any JDK Map implementation.

Besides not wanting to set up side-channels for modifications,
the various toArray methods must have snapshot-like semantics
because they cannot possibly track additions and removals
occurring after (or concurrently with) the array creation.
This is the basis for the the lifetime disclaimers in the
Map.Entry specs.

Several people seem to have the feeling that entrySet().toArray()
should be somehow different about these policies because they know
that HashMap in particular returns internal mutable Entry objects.
Which is arguably also a bad idea. But most other Maps do not
do this. They instead return distinct Entry objects
that snapshot current mappings while usually still supporting
setValue for one-by-one traversal in the same way that CHM does,
invoking map.put. Which is a little bit crazy -- we add some
time/space overhead to entrySet operations just to imperfectly
reinforce the questionable intuition that methods operate
directly on internal Entry objects that don't actually exist
(and need not actually exist wrt any other aspects of Map specs).

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ConcurrentHashMapV8

David M. Lloyd-3
On 10/06/2011 06:30 AM, Doug Lea wrote:

> On 10/05/11 18:31, David M. Lloyd wrote:
>>> In the case of ConcurrentHashMap, this means that the
>>> kind of Entry you get for the elements of entrySet.toArray
>>> should NOT write back to the map you got them from.
>>> In other words:
>>> Object[] a = myMap.entrySet().toArray();
>>> (Entry[])a[0].setValue(17);
>>> should not change myMap.get(keyFor0) to return 17.
>>
>> I disagree with this conclusion; these scenarios aren't really the
>> same. The
>> general contract of toArray() is to return an array consisting of all
>> the values
>> in the collection - I think interpreting it such that it can return
>> copies of
>> the values is a stretch. I can't think of any case where we'd actually
>> copy the
>> values in normal collections;
>
> Never deep copies. But always shallow copies. For example
>
> Object[] a = myMap.values().toArray();
> a[0] = 17;
> also does not change myMap.get(keyFor0) to return 17 in
> any JDK Map implementation.

Yes, but this:

Object[] a = myMap.entrySet().toArray();
a[0] = "Foo";

doesn't change the mapping that happens to be at that element either,
which is much more analogous to your example in my opinion.

> Besides not wanting to set up side-channels for modifications,
> the various toArray methods must have snapshot-like semantics
> because they cannot possibly track additions and removals
> occurring after (or concurrently with) the array creation.
> This is the basis for the the lifetime disclaimers in the
> Map.Entry specs.

> Several people seem to have the feeling that entrySet().toArray()
> should be somehow different about these policies because they know
> that HashMap in particular returns internal mutable Entry objects.
> Which is arguably also a bad idea. But most other Maps do not
> do this. They instead return distinct Entry objects
> that snapshot current mappings while usually still supporting
> setValue for one-by-one traversal in the same way that CHM does,
> invoking map.put. Which is a little bit crazy -- we add some
> time/space overhead to entrySet operations just to imperfectly
> reinforce the questionable intuition that methods operate
> directly on internal Entry objects that don't actually exist
> (and need not actually exist wrt any other aspects of Map specs).

But if the Entry actually *did* reflect the actual physical entry, then
the contract for the Entry interface makes a lot more sense.  You can
retrieve the key always, and the value if it hasn't been removed (if it
has you could give IllegalStateException as is allowed by spec).  The
setValue() method translates more or less directly to an atomic
getAndSet() or get+CAS (this method also may throw ISE in the event of
removal).  You could even use the Entry objects for a primitive sort of
polling of map state, and it's easy to define their behavior for an
arbitrary lifespan without any hinkiness (i.e. no need to tie the
lifespan to the iterator from whence it came, which doesn't make sense
to me anyway, least of all in a concurrent setting).  I think it's
perfectly reasonable to provide an Entry which is more forgiving than
the spec requires, especially if you can give it really clear semantics.

In one of my scratchpads I have a lockless* ConcurrentMap which does
exactly this (have its entries implement Entry, that is) implemented in
terms of an atomic ref array of (immutable) arrays of entries.  I hope
to have time to run it through some more testing in the next couple
weeks, to see how it compares with what's on the table.  So it's at
least possible... we'll see about practical. :)

The other extreme is the copy-on-write type of map where an iterator
would reflect a snapshot and Entries would be immutable.  But this is
also easy to make well-defined for an arbitrary lifespan.

* well, okay, it does lock to block writers during resize, so it's not
100% lockless.
--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
1234