Should old ForkJoinWorkerThread die if starting a new thread fails?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Should old ForkJoinWorkerThread die if starting a new thread fails?

Jarkko Miettinen
Hi,

This does seem like something that would've been discussed before here,
but I could not find anything in the archives or a bug report.

In any case, currently if starting a new thread in
ForkJoinPool#createWorker fails with an exception (OutOfMemoryError
being the most common),  the thread that tries to start that new thread
dies too. In specific situations this can lead to all threads in the
ForkJoinPool dying out which does seem strictly worse than running just
those threads and not spawning new ones.

I think OutOfMemoryError is generally be considered something that
should not be recovered from. But might we here make a different choice
as Thread#start can throw an OOM if it runs into process limits that
prevent starting new threads (why, oh why). This also happens in very
tightly controlled situation and we might want to just continue working
on the tasks. At least if Thread#start has not been overridden.

As code in ForkJoinPool is a bit dense, I am not quite sure what are the
exact required conditions. I just know that there should be both tasks
in the pool and still be room for additional threads in the pool.

The problem will then manifest in stack traces such as this (Oracle JDK
1.8.0_92):

Exception in thread "ForkJoinPool-3983-worker-33"
java.lang.OutOfMemoryError: unable to create new native thread
         at java.lang.Thread.start0(Native Method)
         at java.lang.Thread.start(Thread.java:714)
         at
java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
         at
java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
         at
java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634)
         at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1733)
         at
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1691)
         at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

The little I looked in the latest jsr166 version in the CVS, the
situation seems to be the same even if the methods have changed quite a
bit.

My question is: Is there any way to prevent this and would such
prevention would be beneficial in some or all cases?

At least naively it would seem that Thread#start fails with OOM, we
could just return false and let the existing thread continue. But this
probably is not something that's always wanted and can mask other, more
serious OOMs.

-Jarkko

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should old ForkJoinWorkerThread die if starting a new thread fails?

Nathan & Ila Reynolds
It seems that we have semantic overload here.  There are many factors
which could prevent a new thread from being created.  One such factor is
that there is no address space in the process to create the thread's
stack.  Another such factor is that the process has too many threads.  
It would be great if different exceptions could be thrown based on the
actual condition.  This would make it easier to diagnose the problem.  
It would also allow for the code to catch OutOfThreadHandlesException
and simply run with the existing threads in the pool.

I realize that this is going to be tricky since each OS has its own set
of thread creation problems.  Mapping the disparate sets of problems
into similar meaningful exceptions is going to take a lot of thought.  
Perhaps, someone can collect the various reasons why thread creation
could fail for each OS, then then a group can figure out how to map them
to exceptions.

-Nathan

On 6/6/2017 9:32 AM, Jarkko Miettinen wrote:

> Hi,
>
> This does seem like something that would've been discussed before
> here, but I could not find anything in the archives or a bug report.
>
> In any case, currently if starting a new thread in
> ForkJoinPool#createWorker fails with an exception (OutOfMemoryError
> being the most common),  the thread that tries to start that new
> thread dies too. In specific situations this can lead to all threads
> in the ForkJoinPool dying out which does seem strictly worse than
> running just those threads and not spawning new ones.
>
> I think OutOfMemoryError is generally be considered something that
> should not be recovered from. But might we here make a different
> choice as Thread#start can throw an OOM if it runs into process limits
> that prevent starting new threads (why, oh why). This also happens in
> very tightly controlled situation and we might want to just continue
> working on the tasks. At least if Thread#start has not been overridden.
>
> As code in ForkJoinPool is a bit dense, I am not quite sure what are
> the exact required conditions. I just know that there should be both
> tasks in the pool and still be room for additional threads in the pool.
>
> The problem will then manifest in stack traces such as this (Oracle
> JDK 1.8.0_92):
>
> Exception in thread "ForkJoinPool-3983-worker-33"
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at
> java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>         at
> java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>         at
> java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634)
>         at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1733)
>         at
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1691)
>         at
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
>
> The little I looked in the latest jsr166 version in the CVS, the
> situation seems to be the same even if the methods have changed quite
> a bit.
>
> My question is: Is there any way to prevent this and would such
> prevention would be beneficial in some or all cases?
>
> At least naively it would seem that Thread#start fails with OOM, we
> could just return false and let the existing thread continue. But this
> probably is not something that's always wanted and can mask other,
> more serious OOMs.
>
> -Jarkko
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should old ForkJoinWorkerThread die if starting a new thread fails?

Brian S O'Neill-3
Complicating things further is the Linux OOM killer. Creating and
destroying threads at a high rate can increase the likelihood that the
process gets abruptly killed. So catching the proposed
OutOfThreadHandlesException and proceeding might make the process become
more unstable -- quite the opposite of the intended outcome.

On 2017-06-06 08:42 AM, Nathan and Ila Reynolds wrote:

> It seems that we have semantic overload here.  There are many factors
> which could prevent a new thread from being created.  One such factor is
> that there is no address space in the process to create the thread's
> stack.  Another such factor is that the process has too many threads. It
> would be great if different exceptions could be thrown based on the
> actual condition.  This would make it easier to diagnose the problem. It
> would also allow for the code to catch OutOfThreadHandlesException and
> simply run with the existing threads in the pool.
>
> I realize that this is going to be tricky since each OS has its own set
> of thread creation problems.  Mapping the disparate sets of problems
> into similar meaningful exceptions is going to take a lot of thought.
> Perhaps, someone can collect the various reasons why thread creation
> could fail for each OS, then then a group can figure out how to map them
> to exceptions.
>
> -Nathan
>
> On 6/6/2017 9:32 AM, Jarkko Miettinen wrote:
>> Hi,
>>
>> This does seem like something that would've been discussed before
>> here, but I could not find anything in the archives or a bug report.
>>
>> In any case, currently if starting a new thread in
>> ForkJoinPool#createWorker fails with an exception (OutOfMemoryError
>> being the most common),  the thread that tries to start that new
>> thread dies too. In specific situations this can lead to all threads
>> in the ForkJoinPool dying out which does seem strictly worse than
>> running just those threads and not spawning new ones.
>>
>> I think OutOfMemoryError is generally be considered something that
>> should not be recovered from. But might we here make a different
>> choice as Thread#start can throw an OOM if it runs into process limits
>> that prevent starting new threads (why, oh why). This also happens in
>> very tightly controlled situation and we might want to just continue
>> working on the tasks. At least if Thread#start has not been overridden.
>>
>> As code in ForkJoinPool is a bit dense, I am not quite sure what are
>> the exact required conditions. I just know that there should be both
>> tasks in the pool and still be room for additional threads in the pool.
>>
>> The problem will then manifest in stack traces such as this (Oracle
>> JDK 1.8.0_92):
>>
>> Exception in thread "ForkJoinPool-3983-worker-33"
>> java.lang.OutOfMemoryError: unable to create new native thread
>>         at java.lang.Thread.start0(Native Method)
>>         at java.lang.Thread.start(Thread.java:714)
>>         at
>> java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>>         at
>> java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>>         at
>> java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634)
>>         at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1733)
>>         at
>> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1691)
>>         at
>> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
>>
>>
>> The little I looked in the latest jsr166 version in the CVS, the
>> situation seems to be the same even if the methods have changed quite
>> a bit.
>>
>> My question is: Is there any way to prevent this and would such
>> prevention would be beneficial in some or all cases?
>>
>> At least naively it would seem that Thread#start fails with OOM, we
>> could just return false and let the existing thread continue. But this
>> probably is not something that's always wanted and can mask other,
>> more serious OOMs.
>>
>> -Jarkko
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should old ForkJoinWorkerThread die if starting a new thread fails?

Nathan & Ila Reynolds
By the time OutOfThreadHandlesException is thrown, the likelihood that
the process gets killed by OOM killer is already at the maximum as far
as thread count is concerned.  Catching and dealing with
OutOfThreadHandlesException is already too late for impacting OOM
killer's decision.  However, if one catches OutOfThreadHandlesException
and then have several threads terminate, then the process might escape
OOM killer.

-Nathan

On 6/6/2017 9:48 AM, Brian S O'Neill wrote:

> Complicating things further is the Linux OOM killer. Creating and
> destroying threads at a high rate can increase the likelihood that the
> process gets abruptly killed. So catching the proposed
> OutOfThreadHandlesException and proceeding might make the process
> become more unstable -- quite the opposite of the intended outcome.
>
> On 2017-06-06 08:42 AM, Nathan and Ila Reynolds wrote:
>> It seems that we have semantic overload here.  There are many factors
>> which could prevent a new thread from being created.  One such factor
>> is that there is no address space in the process to create the
>> thread's stack.  Another such factor is that the process has too many
>> threads. It would be great if different exceptions could be thrown
>> based on the actual condition.  This would make it easier to diagnose
>> the problem. It would also allow for the code to catch
>> OutOfThreadHandlesException and simply run with the existing threads
>> in the pool.
>>
>> I realize that this is going to be tricky since each OS has its own
>> set of thread creation problems.  Mapping the disparate sets of
>> problems into similar meaningful exceptions is going to take a lot of
>> thought. Perhaps, someone can collect the various reasons why thread
>> creation could fail for each OS, then then a group can figure out how
>> to map them to exceptions.
>>
>> -Nathan
>>
>> On 6/6/2017 9:32 AM, Jarkko Miettinen wrote:
>>> Hi,
>>>
>>> This does seem like something that would've been discussed before
>>> here, but I could not find anything in the archives or a bug report.
>>>
>>> In any case, currently if starting a new thread in
>>> ForkJoinPool#createWorker fails with an exception (OutOfMemoryError
>>> being the most common),  the thread that tries to start that new
>>> thread dies too. In specific situations this can lead to all threads
>>> in the ForkJoinPool dying out which does seem strictly worse than
>>> running just those threads and not spawning new ones.
>>>
>>> I think OutOfMemoryError is generally be considered something that
>>> should not be recovered from. But might we here make a different
>>> choice as Thread#start can throw an OOM if it runs into process
>>> limits that prevent starting new threads (why, oh why). This also
>>> happens in very tightly controlled situation and we might want to
>>> just continue working on the tasks. At least if Thread#start has not
>>> been overridden.
>>>
>>> As code in ForkJoinPool is a bit dense, I am not quite sure what are
>>> the exact required conditions. I just know that there should be both
>>> tasks in the pool and still be room for additional threads in the pool.
>>>
>>> The problem will then manifest in stack traces such as this (Oracle
>>> JDK 1.8.0_92):
>>>
>>> Exception in thread "ForkJoinPool-3983-worker-33"
>>> java.lang.OutOfMemoryError: unable to create new native thread
>>>         at java.lang.Thread.start0(Native Method)
>>>         at java.lang.Thread.start(Thread.java:714)
>>>         at
>>> java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>>>         at
>>> java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>>>         at
>>> java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634)
>>>         at
>>> java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1733)
>>>         at
>>> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1691)
>>>         at
>>> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
>>>
>>>
>>> The little I looked in the latest jsr166 version in the CVS, the
>>> situation seems to be the same even if the methods have changed
>>> quite a bit.
>>>
>>> My question is: Is there any way to prevent this and would such
>>> prevention would be beneficial in some or all cases?
>>>
>>> At least naively it would seem that Thread#start fails with OOM, we
>>> could just return false and let the existing thread continue. But
>>> this probably is not something that's always wanted and can mask
>>> other, more serious OOMs.
>>>
>>> -Jarkko
>>>
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> [hidden email]
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

--
-Nathan

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should old ForkJoinWorkerThread die if starting a new thread fails?

Alex Otenko
In reply to this post by Nathan & Ila Reynolds
What is the failure model here?

1. Provisioned a limit of N threads
2. Consumed N threads
3. Creation of N+1 thread fails

What is the limit of N threads based on? Should the thread pools be sized instead to not exceed N? Is the target software creating threads outside pools?

I think there may be a few good engineering practices to take into account first. OOME on thread creation means someone did not heed those practices: did not size the environment, did not size pools, spawned non-pooled threads.

Alex

> On 6 Jun 2017, at 16:42, Nathan and Ila Reynolds <[hidden email]> wrote:
>
> It seems that we have semantic overload here.  There are many factors which could prevent a new thread from being created.  One such factor is that there is no address space in the process to create the thread's stack.  Another such factor is that the process has too many threads.  It would be great if different exceptions could be thrown based on the actual condition.  This would make it easier to diagnose the problem.  It would also allow for the code to catch OutOfThreadHandlesException and simply run with the existing threads in the pool.
>
> I realize that this is going to be tricky since each OS has its own set of thread creation problems.  Mapping the disparate sets of problems into similar meaningful exceptions is going to take a lot of thought.  Perhaps, someone can collect the various reasons why thread creation could fail for each OS, then then a group can figure out how to map them to exceptions.
>
> -Nathan
>
> On 6/6/2017 9:32 AM, Jarkko Miettinen wrote:
>> Hi,
>>
>> This does seem like something that would've been discussed before here, but I could not find anything in the archives or a bug report.
>>
>> In any case, currently if starting a new thread in ForkJoinPool#createWorker fails with an exception (OutOfMemoryError being the most common),  the thread that tries to start that new thread dies too. In specific situations this can lead to all threads in the ForkJoinPool dying out which does seem strictly worse than running just those threads and not spawning new ones.
>>
>> I think OutOfMemoryError is generally be considered something that should not be recovered from. But might we here make a different choice as Thread#start can throw an OOM if it runs into process limits that prevent starting new threads (why, oh why). This also happens in very tightly controlled situation and we might want to just continue working on the tasks. At least if Thread#start has not been overridden.
>>
>> As code in ForkJoinPool is a bit dense, I am not quite sure what are the exact required conditions. I just know that there should be both tasks in the pool and still be room for additional threads in the pool.
>>
>> The problem will then manifest in stack traces such as this (Oracle JDK 1.8.0_92):
>>
>> Exception in thread "ForkJoinPool-3983-worker-33" java.lang.OutOfMemoryError: unable to create new native thread
>>        at java.lang.Thread.start0(Native Method)
>>        at java.lang.Thread.start(Thread.java:714)
>>        at java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>>        at java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>>        at java.util.concurrent.ForkJoinPool.signalWork(ForkJoinPool.java:1634)
>>        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1733)
>>        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1691)
>>        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
>>
>> The little I looked in the latest jsr166 version in the CVS, the situation seems to be the same even if the methods have changed quite a bit.
>>
>> My question is: Is there any way to prevent this and would such prevention would be beneficial in some or all cases?
>>
>> At least naively it would seem that Thread#start fails with OOM, we could just return false and let the existing thread continue. But this probably is not something that's always wanted and can mask other, more serious OOMs.
>>
>> -Jarkko
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> --
> -Nathan
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Should old ForkJoinWorkerThread die if starting a new thread fails?

Doug Lea
In reply to this post by Jarkko Miettinen
On 06/06/2017 11:32 AM, Jarkko Miettinen wrote:

> In any case, currently if starting a new thread in
> ForkJoinPool#createWorker fails with an exception (OutOfMemoryError
> being the most common),  the thread that tries to start that new thread
> dies too.

You can catch the exception, and try to cope.
One possibility is to try to help execute tasks that might
not otherwise be run. As in:

try {
   task.fork(); // or
   pool.submit(task); // or similar
} catch (Exception ex) {
   ForkJoinTask.helpQuiesce(); // or
   pool.awaitQuiescence(1, SECONDS); // help run tasks for at most 1 sec
}

(You could then encasulate this as a helper method.)

If the failure is a memory-based OOME, then this might also fail.
And in any case won't execute tasks  asynchronously using the same
scheduling. But it does provide a best-effort fallback.

> I think OutOfMemoryError is generally be considered something that
> should not be recovered from.

As implied in other replies, one reason for not retrying here
(also ThreadPoolExecutor) is to help avoid infinitely cascading
(and sometimes silent) failures, for which we (in j.u.c) have no
recourse.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Loading...