Quantcast

Suspecting a problem in recent jdk-9 builds

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Suspecting a problem in recent jdk-9 builds

Antoine Tissier
Hi,

We have been running benchmarks for our in-memory analytics software ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical cores (288 physical cores)).
Our benchmarks involve high parallelism along with many queries divided in a high number of tasks (CountedCompleters) in the ForkJoinPool. With build 145 of jdk-9, some tasks are not executed, causing larger completion problems. However, with the earlier build 111, the problem does not occur.

On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical cores), 512 GB RAM) but with a similar setup, the problem was not reproduced. 

The problem seems to arise when a large number of completers (>20 000) are involved: forking tasks works well but when submitting tasks to a new pool, it seems that their compute method is sometimes not called.
We indeed log every call to ForkJoinPool.submit, as well as everytime a completer enters its compute method, and clearly see that once in a while, the task is never computed after having been submitted. We let the system run for an additional hour, and there was no more progress even though the system was idle. Thread dumps did not show any suspect activity (all worker threads were idle).

We tried to reproduce the problem with a similar but more simple test, but it was not successful. 

Are you aware of any concurrency/task completion problems in the more recent builds of jdk-9 ?
Are there any additional tests that we could run in order to diagnose this issue ?

Best regards,
Antoine



--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile<a href="tel:+33%206%2026%2033%2035%2062" value="+33626333562" target="_blank">+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Andrew Haley
On 28/12/16 09:30, Antoine Tissier wrote:
> We tried to reproduce the problem with a similar but more simple test, but
> it was not successful.

Did you run jcstress?  If not, please do that first.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Doug Lea
In reply to this post by Antoine Tissier
On 12/28/2016 04:30 AM, Antoine Tissier wrote:

> Hi,
>
> We have been running benchmarks for our in-memory analytics software
> ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical
> cores (288 physical cores)).
> Our benchmarks involve high parallelism along with many queries divided
> in a high number of tasks (CountedCompleters) in the ForkJoinPool. With
> build 145 of jdk-9, some tasks are not executed, causing larger
> completion problems. However, with the earlier build 111, the problem
> does not occur.
>
> On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical
> cores), 512 GB RAM) but with a similar setup, the problem was not
> reproduced.
>
> The problem seems to arise when a large number of completers (>20 000)
> are involved: forking tasks works well but when submitting tasks to a
> new pool, it seems that their compute method is sometimes not called.
> We indeed log every call to ForkJoinPool.submit, as well as everytime a
> completer enters its compute method, and clearly see that once in a
> while, the task is never computed after having been submitted. We let
> the system run for an additional hour, and there was no more progress
> even though the system was idle. Thread dumps did not show any suspect
> activity (all worker threads were idle).
>
> We tried to reproduce the problem with a similar but more simple test,
> but it was not successful.
>
> Are you aware of any concurrency/task completion problems in the more
> recent builds of jdk-9 ?

The only changes in any relevant j.u.c classes were to incorporate
VarHandles in June. I believe these were tested on Sparcs, but not
by me.

> Are there any additional tests that we could run in order to diagnose
> this issue ?

It's not easy to diagnose a problem that seems to be specific
to a machine and program we don't have.

Some initial checks would be to try different VM and GC settings,
especially -XX:+UseParallelGC, (vs default UseG1GC) and
-XX:-UseBiasedLocking). Also, if using commonPool, try
    -Djava.util.concurrent.ForkJoinPool.common.parallelism=n
for different values of n.

These would help rule out some kinds of problems.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Martin Buchholz-3
In reply to this post by Antoine Tissier
Experience suggests that such problems are *usually* in the application code, but of course there are undiscovered bugs in java.util.concurrent.

Since only you can reproduce the problem, only you can narrow down the possible root causes.  You could build your own openjdk9, bisect to the exact commit that is causing problems, but it would be a lot of work, and it might in the end be a change to hotspot gc, with root cause still unknown...

On Wed, Dec 28, 2016 at 1:30 AM, Antoine Tissier <[hidden email]> wrote:
Hi,

We have been running benchmarks for our in-memory analytics software ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical cores (288 physical cores)).
Our benchmarks involve high parallelism along with many queries divided in a high number of tasks (CountedCompleters) in the ForkJoinPool. With build 145 of jdk-9, some tasks are not executed, causing larger completion problems. However, with the earlier build 111, the problem does not occur.

On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical cores), 512 GB RAM) but with a similar setup, the problem was not reproduced. 

The problem seems to arise when a large number of completers (>20 000) are involved: forking tasks works well but when submitting tasks to a new pool, it seems that their compute method is sometimes not called.
We indeed log every call to ForkJoinPool.submit, as well as everytime a completer enters its compute method, and clearly see that once in a while, the task is never computed after having been submitted. We let the system run for an additional hour, and there was no more progress even though the system was idle. Thread dumps did not show any suspect activity (all worker threads were idle).

We tried to reproduce the problem with a similar but more simple test, but it was not successful. 

Are you aware of any concurrency/task completion problems in the more recent builds of jdk-9 ?
Are there any additional tests that we could run in order to diagnose this issue ?

Best regards,
Antoine



--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile<a href="tel:+33%206%2026%2033%2035%2062" value="+33626333562" target="_blank">+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Antoine Tissier
Thank you for all your answers.

We tried to run jcstress and all the tests passed without any error.

Our application uses several TBs of RAM, so we need to use G1GC. The problem still occurs when adding -XX:-UseBiasedLocking to the VM args. We are going to try building the application with the commit corresponding to the addition of the VarHandles and the one just before to see if we can narrow the problem down to this change.

Note that we are a partner of Oracle and we are running our application on the Oracle network, so it is accessible by the JVM engineers. Let us know if you need to arrange access to the application.

Best,
Antoine

On Wed, Dec 28, 2016 at 5:56 PM, Martin Buchholz <[hidden email]> wrote:
Experience suggests that such problems are *usually* in the application code, but of course there are undiscovered bugs in java.util.concurrent.

Since only you can reproduce the problem, only you can narrow down the possible root causes.  You could build your own openjdk9, bisect to the exact commit that is causing problems, but it would be a lot of work, and it might in the end be a change to hotspot gc, with root cause still unknown...

On Wed, Dec 28, 2016 at 1:30 AM, Antoine Tissier <[hidden email]> wrote:
Hi,

We have been running benchmarks for our in-memory analytics software ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical cores (288 physical cores)).
Our benchmarks involve high parallelism along with many queries divided in a high number of tasks (CountedCompleters) in the ForkJoinPool. With build 145 of jdk-9, some tasks are not executed, causing larger completion problems. However, with the earlier build 111, the problem does not occur.

On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical cores), 512 GB RAM) but with a similar setup, the problem was not reproduced. 

The problem seems to arise when a large number of completers (>20 000) are involved: forking tasks works well but when submitting tasks to a new pool, it seems that their compute method is sometimes not called.
We indeed log every call to ForkJoinPool.submit, as well as everytime a completer enters its compute method, and clearly see that once in a while, the task is never computed after having been submitted. We let the system run for an additional hour, and there was no more progress even though the system was idle. Thread dumps did not show any suspect activity (all worker threads were idle).

We tried to reproduce the problem with a similar but more simple test, but it was not successful. 

Are you aware of any concurrency/task completion problems in the more recent builds of jdk-9 ?
Are there any additional tests that we could run in order to diagnose this issue ?

Best regards,
Antoine



--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile<a href="tel:+33%206%2026%2033%2035%2062" value="+33626333562" target="_blank">+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Antoine Tissier
Further to my previous message I am trying to checkout the revisions of the jdk9 project corresponding to the addition of the VarHandles and its parent revision, and then build the respective JVMs. This is the first time I am trying to do this and I am not sure of the right way to go.

So far I have identified the sha-1 of the respective changesets in the jdk child project (955eab36f5da and f3af17da360b), but I do not know how to relate them to the changesets of the root jdk9 project (which I need in order to checkout the right revisions of the whole project and build the corresponding JVMs). What should I do to find the right changesets? Do I need the forest extension in Mercurial in order to apply them properly?

Many thanks,
Antoine

On Fri, Dec 30, 2016 at 6:03 PM, Antoine Tissier <[hidden email]> wrote:
Thank you for all your answers.

We tried to run jcstress and all the tests passed without any error.

Our application uses several TBs of RAM, so we need to use G1GC. The problem still occurs when adding -XX:-UseBiasedLocking to the VM args. We are going to try building the application with the commit corresponding to the addition of the VarHandles and the one just before to see if we can narrow the problem down to this change.

Note that we are a partner of Oracle and we are running our application on the Oracle network, so it is accessible by the JVM engineers. Let us know if you need to arrange access to the application.

Best,
Antoine

On Wed, Dec 28, 2016 at 5:56 PM, Martin Buchholz <[hidden email]> wrote:
Experience suggests that such problems are *usually* in the application code, but of course there are undiscovered bugs in java.util.concurrent.

Since only you can reproduce the problem, only you can narrow down the possible root causes.  You could build your own openjdk9, bisect to the exact commit that is causing problems, but it would be a lot of work, and it might in the end be a change to hotspot gc, with root cause still unknown...

On Wed, Dec 28, 2016 at 1:30 AM, Antoine Tissier <[hidden email]> wrote:
Hi,

We have been running benchmarks for our in-memory analytics software ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical cores (288 physical cores)).
Our benchmarks involve high parallelism along with many queries divided in a high number of tasks (CountedCompleters) in the ForkJoinPool. With build 145 of jdk-9, some tasks are not executed, causing larger completion problems. However, with the earlier build 111, the problem does not occur.

On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical cores), 512 GB RAM) but with a similar setup, the problem was not reproduced. 

The problem seems to arise when a large number of completers (>20 000) are involved: forking tasks works well but when submitting tasks to a new pool, it seems that their compute method is sometimes not called.
We indeed log every call to ForkJoinPool.submit, as well as everytime a completer enters its compute method, and clearly see that once in a while, the task is never computed after having been submitted. We let the system run for an additional hour, and there was no more progress even though the system was idle. Thread dumps did not show any suspect activity (all worker threads were idle).

We tried to reproduce the problem with a similar but more simple test, but it was not successful. 

Are you aware of any concurrency/task completion problems in the more recent builds of jdk-9 ?
Are there any additional tests that we could run in order to diagnose this issue ?

Best regards,
Antoine



--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile<a href="tel:+33%206%2026%2033%2035%2062" value="+33626333562" target="_blank">+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest





--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile<a href="tel:+33%206%2026%2033%2035%2062" value="+33626333562" target="_blank">+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com




--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Andrew Haley
On 03/01/17 16:01, Antoine Tissier wrote:

>
> So far I have identified the sha-1 of the respective changesets in
> the jdk child project (955eab36f5da
> <http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/955eab36f5da> and
> f3af17da360b
> <http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/f3af17da360b>), but I
> do not know how to relate them to the changesets of the root jdk9
> project (which I need in order to checkout the right revisions of
> the whole project and build the corresponding JVMs). What should I
> do to find the right changesets? Do I need the forest extension in
> Mercurial in order to apply them properly?

Unfortunately they are not synchronized.  If you're going to bisect
your checkouts then you will probably need to do it based on a single
project (such as jdk) or check out all of the subtrees based on a
particular commit time, e.g. midnight on a particular day.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Doug Lea
In reply to this post by Antoine Tissier
On 01/03/2017 11:01 AM, Antoine Tissier wrote:
> Further to my previous message I am trying to checkout the revisions of
> the jdk9 project <http://hg.openjdk.java.net/jdk9/jdk9/> corresponding
> to the addition of the VarHandles and its parent revision, and then
> build the respective JVMs. This is the first time I am trying to do this
> and I am not sure of the right way to go.

I was about to suggest that you first try to isolate across
weekly builds, but I no longer see a link to "previous" builds at
   https://jdk9.java.net/download/
If someone knows, please post.

Also, if it comes to it, ask (off-list) for a VarHandle-less
version of jsr166.jar that you could run.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Paul Sandoz
In reply to this post by Doug Lea

> On 28 Dec 2016, at 04:30, Doug Lea <[hidden email]> wrote:
>
> On 12/28/2016 04:30 AM, Antoine Tissier wrote:
>> Hi,
>>
>> We have been running benchmarks for our in-memory analytics software
>> ActivePivot on a M6.32 machine (Solaris Sparc, 8 TB RAM, 2304 logical
>> cores (288 physical cores)).
>> Our benchmarks involve high parallelism along with many queries divided
>> in a high number of tasks (CountedCompleters) in the ForkJoinPool. With
>> build 145 of jdk-9, some tasks are not executed, causing larger
>> completion problems. However, with the earlier build 111, the problem
>> does not occur.
>>
>> On a smaller Linux machine (Linux amd 64, 64 logical cores (32 physical
>> cores), 512 GB RAM) but with a similar setup, the problem was not
>> reproduced.
>>
>> The problem seems to arise when a large number of completers (>20 000)
>> are involved: forking tasks works well but when submitting tasks to a
>> new pool, it seems that their compute method is sometimes not called.
>> We indeed log every call to ForkJoinPool.submit, as well as everytime a
>> completer enters its compute method, and clearly see that once in a
>> while, the task is never computed after having been submitted. We let
>> the system run for an additional hour, and there was no more progress
>> even though the system was idle. Thread dumps did not show any suspect
>> activity (all worker threads were idle).
>>
>> We tried to reproduce the problem with a similar but more simple test,
>> but it was not successful.
>>
>> Are you aware of any concurrency/task completion problems in the more
>> recent builds of jdk-9 ?
>
> The only changes in any relevant j.u.c classes were to incorporate
> VarHandles in June. I believe these were tested on Sparcs, but not
> by me.
>
Yes, the SPARC platform would be included in our battery of tests (which should include the newly added jcstress tests).

The VarHandle method execution will wire up to the equivalent unsafe methods and those used by Fork/Join should all be intrinsic on SPARC (and those intrinsics should not have changed, we just added more, but i will go back and eyeball the code).

My first thought was that some inlining limit was reached, but since Antoine points out that certain tasks are failing to complete i wonder if it’s a combination of VarHandle use and restructure of the implementation?

Paul.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

signature.asc (858 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Antoine Tissier
As Doug suggested, I first tried to bisect across weekly builds, using Gilles method of renaming the URLs in the download page with the right build number. 

It appears that the problem is reproduced with build ea+129 but not with build ea+128. The latter was released before changeset 955eab36f5da of the jdk, which was committed later the same day and includes the replacement of Unsafe with VarHandles in j.u.c classes. 

If needed it is possible to arrange access to our system in order to investigate the problem.

Antoine
--
ActiveViam

46 rue de l'Arbre Sec, 75001 Paris
France
TwitterBlogLinkedinYoutube
Antoine Tissier
Junior Software Engineer
 

Mobile+33(0) 6.26.33.35.62 
Skypeantoine_tissier
Websitewww.activeviam.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Doug Lea
On 01/05/2017 09:24 AM, Antoine Tissier wrote:

> It appears that the problem is reproduced with build ea+129 but not with
> build ea+128. The latter was released before changeset 955eab36f5da
> <http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/955eab36f5da> of the jdk,
> which was committed later the same day and includes the replacement of
> Unsafe with VarHandles in j.u.c classes.
>
> If needed it is possible to arrange access to our system in order to
> investigate the problem.
>

OK, let's arrange this off-list.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Suspecting a problem in recent jdk-9 builds

Martin Buchholz-3
In reply to this post by Antoine Tissier
For future reference, one can find sets of changes via mercurial, like:
(cd ~/ws/jdk9/jdk && hg log -r "'jdk-9+128'::'jdk-9+129'")

If I narrow down, I see:
 $ (cd ~/ws/jdk9/jdk && hg log -u dl -r "'jdk-9+128'::'jdk-9+129'")
changeset:   15085:d04ea07c1629
parent:      15083:9446c534f022
user:        dl
date:        Fri Jul 15 13:51:43 2016 -0700
summary:     8159924: Various improvements to StampedLock code

changeset:   15086:fd4819ec5afd
user:        dl
date:        Fri Jul 15 13:55:51 2016 -0700
summary:     8157523: Various improvements to ForkJoin/SubmissionPublisher code

changeset:   15087:f3af17da360b
user:        dl
date:        Fri Jul 15 13:59:58 2016 -0700
summary:     8157522: Performance improvements to CompletableFuture

changeset:   15088:955eab36f5da
user:        dl
date:        Fri Jul 15 14:04:09 2016 -0700
summary:     8080603: Replace Unsafe with VarHandle in java.util.concurrent classes

changeset:   15140:c659d2cdc7ba
user:        dl
date:        Tue Jul 26 09:49:25 2016 -0700
summary:     8162396: j.u.c java.lang.LinkageError

changeset:   15141:fe3146f5e7b1
user:        dl
date:        Tue Jul 26 09:53:38 2016 -0700
summary:     8160402: Garbage retention with CompletableFuture.anyOf

changeset:   15142:fe0d3813e6c3
user:        dl
date:        Tue Jul 26 09:57:51 2016 -0700
summary:     8160751: Optimize ConcurrentHashMap.keySet().removeAll

changeset:   15143:e2c8961887a2
user:        dl
date:        Tue Jul 26 10:02:05 2016 -0700
summary:     8161608: StampedLock should use storeStoreFence when acquiring write lock

changeset:   15144:47699aa2e69e
tag:         jdk-9+129
user:        dl
date:        Tue Jul 26 10:06:19 2016 -0700
summary:     8161591: Miscellaneous changes imported from jsr166 CVS 2016-07


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Loading...