Quantcast

Does JDK 9 String.hashCode() have a bug?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Does JDK 9 String.hashCode() have a bug?

Peter Levart

Hi,

I would like to discuss the implementation of String.hashCode() method in the light of JMM. Here it is (the recent one from JDK 9 which includes implementation of compact strings):


    public int hashCode() {
        if (hash == 0 && value.length > 0) {
            hash = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
        }
        return hash;
    }


Here, 'hash' is a plain instance field in String. Suppose we have a freshly constructed non-zero length String object, whose reference 's' is accessible to two threads and each of them concurrently invokes hashCode() on it:


Thread 1:

int h1 = s.hashCode();


Thread 2:

int h2 = s.hashCode();


Suppose also that the isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value) computation returns a non-zero value.


Is it possible that the outcome (h1, h2) is either (0, hc) or (hc, 0) where the hc is the non-zero result of above mentioned computation?

According to JMM, I think it is. Since there is no synchronization between a write w(hash, hc) by one thread and two reads r1(value):v1 and r2(value):v2 by the other thread that correspond to program lines:

    if (hash == 0 ....

and:

    return hash;


...the reads may or may not see the write independently. So there is a possibility that v1 == hc and v2 == 0.


Am I right and should this be fixed?


Note that JDK 8 code is different. It does not have this problem:


public int hashCode() {
        int h = hash;
        if (h == 0) {
            for (char v : value) {
                h = 31 * h + v;
            }
            if (h != 0) {
                hash = h;
            }
        }
        return h;
    }


Regards, Peter



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

Aleksey Shipilev-3
Hi Peter,

On 09/28/2016 12:33 PM, Peter Levart wrote:
>     public int hashCode() {
>         if (hash == 0 && value.length > 0) {
>             hash = isLatin1() ? StringLatin1.hashCode(value)
>                               : StringUTF16.hashCode(value);
>         }
>         return hash;
>     }

Yes, this is a non-benign data race: should not do the second racy read
at "return hash". Embarrassing that we've missed this during code
reviews. Submit a bug/patch?

Thanks,
-Aleksey

P.S. Your explanation is similar to a more detailed of mine here:

https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#wishful-benign-is-resilient


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

Jonas Konrad
Hey Aleksey,

On 09/28/2016 12:38 PM, Aleksey Shipilev wrote:
> P.S. Your explanation is similar to a more detailed of mine here:
>
> https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#wishful-benign-is-resilient

Wouldn't, according to that section in your blog, the old hashcode (and
the suggested fix) also be unsafe? From what I can tell the hashcode in
Java 8 already assumed (which it can because it's JDK code) that the VM
inserts a memory barrier in the String constructor because of the final
value field.

- Jonas
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

David Holmes-6
In reply to this post by Peter Levart

I thought that one of the properties of the JMM is that a second read of a variable can not return an earlier write to that variable. Certainly it is a property I recall being discussed in JSR-133.

 

David

 

From: Concurrency-interest [mailto:[hidden email]] On Behalf Of Peter Levart
Sent: Wednesday, September 28, 2016 8:33 PM
To: concurrency-interest <[hidden email]>
Subject: [concurrency-interest] Does JDK 9 String.hashCode() have a bug?

 

Hi,

I would like to discuss the implementation of String.hashCode() method in the light of JMM. Here it is (the recent one from JDK 9 which includes implementation of compact strings):

 

    public int hashCode() {
        if (hash == 0 && value.length > 0) {
            hash = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
        }
        return hash;
    }

 

Here, 'hash' is a plain instance field in String. Suppose we have a freshly constructed non-zero length String object, whose reference 's' is accessible to two threads and each of them concurrently invokes hashCode() on it:

 

Thread 1:

int h1 = s.hashCode();

 

Thread 2:

int h2 = s.hashCode();

 

Suppose also that the isLatin1() ? StringLatin1.hashCode(value) : StringUTF16.hashCode(value) computation returns a non-zero value.

 

Is it possible that the outcome (h1, h2) is either (0, hc) or (hc, 0) where the hc is the non-zero result of above mentioned computation?

According to JMM, I think it is. Since there is no synchronization between a write w(hash, hc) by one thread and two reads r1(value):v1 and r2(value):v2 by the other thread that correspond to program lines:

    if (hash == 0 ....

and:

    return hash;

 

...the reads may or may not see the write independently. So there is a possibility that v1 == hc and v2 == 0.

 

Am I right and should this be fixed?

 

Note that JDK 8 code is different. It does not have this problem:

 

public int hashCode() {
        int h = hash;
        if (h == 0) {
            for (char v : value) {
                h = 31 * h + v;
            }
            if (h != 0) {
                hash = h;
            }
        }
        return h;
    }

 

Regards, Peter

 


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

Aleksey Shipilev-3
In reply to this post by Jonas Konrad
On 09/28/2016 01:28 PM, Jonas Konrad wrote:
> On 09/28/2016 12:38 PM, Aleksey Shipilev wrote:
>> P.S. Your explanation is similar to a more detailed of mine here:
>>
>> https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#wishful-benign-is-resilient
>
> Wouldn't, according to that section in your blog, the old hashcode (and
> the suggested fix) also be unsafe?

Why unsafe? JDK 8 String.hashCode is a classic example of the benign
data race. It's benignity comes, as with other benign data races, from
two conditions:

 a) The object being published is safely constructed. In this particular
case, we publish the primitive value, and it is safe by definition.

 b) We read the field only once, which is exactly what JDK 8
String.hashCode does, but JDK 9 String.hashCode lacks.

JDK 8:

  public int hashCode() {
    int h = hash;  // <--- read once
    if (h == 0) {  // <--- not 0? proceed to return
      for (char v : value) {
        h = 31 * h + v;
      }
      if (h != 0) {
        hash = h;
      }
    }
    return h;  // <--- return the verified value, DO NOT racy read again
  }

> From what I can tell the hashcode in Java 8 already assumed (which it
> can because it's JDK code) that the VM inserts a memory barrier in
> the String constructor because of the final value field.

Note this is *within* the String instance already, no "final value field
init in String" applies here.

Thanks,
-Aleksey





_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

Jonas Konrad
Take this example:

class Race {
     String s = "a";
     void writer() { s = "b"; }

     void reader() {
         String tmp = s;
         assert s.hashCode() == s.hashCode();
     }
}

class String {
     final char[] value; int hash;

     int hashCode() {
         int h = hash;
         if (h == 0) {
             hash = h = computeHashCode(value);
         }
         return h;
     }
}

In the JMM, can't that assertion fail? Since the 'hash' field is not
final and the String is not safely published, The reader could observe
the 'old' hashCode of "a" on the first read, not do any computation
because h != 0, and then see the 'new' hashCode with h == 0 in the
second read.

I see how the two reads are a good example of your other
"NonBenignRace2" though.

Thanks,
- Jonas

On 09/28/2016 01:37 PM, Aleksey Shipilev wrote:

> On 09/28/2016 01:28 PM, Jonas Konrad wrote:
>> On 09/28/2016 12:38 PM, Aleksey Shipilev wrote:
>>> P.S. Your explanation is similar to a more detailed of mine here:
>>>
>>> https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#wishful-benign-is-resilient
>>
>> Wouldn't, according to that section in your blog, the old hashcode (and
>> the suggested fix) also be unsafe?
>
> Why unsafe? JDK 8 String.hashCode is a classic example of the benign
> data race. It's benignity comes, as with other benign data races, from
> two conditions:
>
>  a) The object being published is safely constructed. In this particular
> case, we publish the primitive value, and it is safe by definition.
>
>  b) We read the field only once, which is exactly what JDK 8
> String.hashCode does, but JDK 9 String.hashCode lacks.
>
> JDK 8:
>
>   public int hashCode() {
>     int h = hash;  // <--- read once
>     if (h == 0) {  // <--- not 0? proceed to return
>       for (char v : value) {
>         h = 31 * h + v;
>       }
>       if (h != 0) {
>         hash = h;
>       }
>     }
>     return h;  // <--- return the verified value, DO NOT racy read again
>   }
>
>> From what I can tell the hashcode in Java 8 already assumed (which it
>> can because it's JDK code) that the VM inserts a memory barrier in
>> the String constructor because of the final value field.
>
> Note this is *within* the String instance already, no "final value field
> init in String" applies here.
>
> Thanks,
> -Aleksey
>
>
>
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Does JDK 9 String.hashCode() have a bug?

Aleksey Shipilev-3
On 09/28/2016 01:55 PM, Jonas Konrad wrote:

> Take this example:
>
> class Race {
>     String s = "a";
>     void writer() { s = "b"; }
>
>     void reader() {
>         String tmp = s;
>         assert s.hashCode() == s.hashCode();
>     }
> }
>
> class String {
>     final char[] value; int hash;
>
>     int hashCode() {
>         int h = hash;
>         if (h == 0) {
>             hash = h = computeHashCode(value);
>         }
>         return h;
>     }
> }
>
> In the JMM, can't that assertion fail?
Well, it can fail, because you have the prior race on Race.s, and you
might be comparing the hashcodes from two different Strings to begin
with. Obviously, this cannot be recovered in String.hashCode:

     void reader() {
         String tmp = s;
         assert s.hashCode() == s.hashCode();
     }

...but if you were to write:

     void reader() {
         String tmp = s;
         assert tmp.hashCode() == tmp.hashCode();
     }

...then it cannot fail, because String.hashCode() as stated cannot
return 0. (This is also one of many reasons why compilers cannot
generally replace local variables with memory reads, exposing users to
races). String.hashCode above will take a corrective action it if
happens to read (hash == 0), and return the recomputed one.

Thanks,
-Aleksey



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (836 bytes) Download Attachment
Loading...