ThreadPoolExecutor API

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ThreadPoolExecutor API

JSR166 Concurrency mailing list
I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with

new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);

I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks. My conclusion so far:

1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.

2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)

3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.

Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list
I've wanted for many years to fix

ThreadPoolExecutor should prefer reusing idle threads

but never finished any of many starts.
---
The distinction between core pool size and max pool size is confusing - this API didn't work out well.
More generally, we're not happy with all the configuration knobs, that fail to make it easy for users to get what they want.
---
(Nevertheless, the world runs on ThreadPoolExecutor!)

On Wed, Mar 20, 2019 at 1:17 PM Hans Boehm via Concurrency-interest <[hidden email]> wrote:
I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with

new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);

I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks. My conclusion so far:

1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.

2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)

3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.

Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Hi Hans,

> From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> Sent: Thursday, March 21, 2019 6:15 AM
> To: [hidden email]
> Subject: [concurrency-interest] ThreadPoolExecutor API
>
> I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
>
> new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
>
> I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.

Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:

“The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”

I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.

> My conclusion so far:

> 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.

I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.

> 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)

Having unequal core and max with other than a SynchronousQueue is not odd at all.

> 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.

Yes it's possible and yes somewhat obscure.

> Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?

If there were demand for this ...

Cheers,
David

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list
On Wed, Mar 20, 2019 at 1:48 PM David Holmes <[hidden email]> wrote:

>
> Hi Hans,
>
> > From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> > Sent: Thursday, March 21, 2019 6:15 AM
> > To: [hidden email]
> > Subject: [concurrency-interest] ThreadPoolExecutor API
> >
> > I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
> >
> > new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
> >
> > I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> > more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.
>
> Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:
>
> “The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”
David -

That's a good way to view it, and I agree that this is not always a disastrous thing to do. But in most cases, especially in a latency-sensitive environment, this is suboptimal. Why wait before launching the extra threads, if there are tasks piling up in the queue? By starting the extra threads earlier, you end up with better latency. Admittedly, it may force you to start some extra temporary threads that you could have avoided starting. But, at least in our case, that very rarely seems to be the right trade-off.

This seems particularly dubious for more general purpose shared thread pools, when the tasks may block on IO for extended periods. If I try to execute SMALL_NO long-blocking tasks, and then a 100 short compute-bound ones, everything needlessly blocks until the IO completes, potentially causing a serious system hiccup. The reasons for allowing thread pools to expand is usually to avoid such hiccups.

So I agree the intent is not 100% obvious, it seems unlikely this was intended in our case. And some users and readers of the code in question expressed surprise when I explained how it worked.

>
> I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.
>
> > My conclusion so far:
>
> > 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.
>
> I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.

I think people who read the code are often mislead. The authors should indeed read the documentation, though I'm not sure they always do.

Hans

>
> > 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> > queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> > other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)
>
> Having unequal core and max with other than a SynchronousQueue is not odd at all.
>
> > 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.
>
> Yes it's possible and yes somewhat obscure.
>
> > Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
>
> If there were demand for this ...
>
> Cheers,
> David
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list

Hi Hans,

 

Obviously you can construct scenarios that are advantaged or disadvantaged by any given policy. If your system has sufficient capacity then adding more threads rather than buffering requests will reduce latency for those extra requests. But if you don’t have sufficient capacity adding more threads may cause resource contention that increases average latency. But if you have the capacity then why not just increase the number of core threads to start with?

 

Cheers,

David

 

From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
Sent: Thursday, March 21, 2019 7:27 AM
To: David Holmes <[hidden email]>
Cc: [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API

 

On Wed, Mar 20, 2019 at 1:48 PM David Holmes <[hidden email]> wrote:


>
> Hi Hans,
>
> > From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> > Sent: Thursday, March 21, 2019 6:15 AM
> > To: [hidden email]
> > Subject: [concurrency-interest] ThreadPoolExecutor API
> >
> > I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
> >
> > new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
> >
> > I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> > more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.
>
> Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:
>
> “The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”

 

David -

 

That's a good way to view it, and I agree that this is not always a disastrous thing to do. But in most cases, especially in a latency-sensitive environment, this is suboptimal. Why wait before launching the extra threads, if there are tasks piling up in the queue? By starting the extra threads earlier, you end up with better latency. Admittedly, it may force you to start some extra temporary threads that you could have avoided starting. But, at least in our case, that very rarely seems to be the right trade-off.

 

This seems particularly dubious for more general purpose shared thread pools, when the tasks may block on IO for extended periods. If I try to execute SMALL_NO long-blocking tasks, and then a 100 short compute-bound ones, everything needlessly blocks until the IO completes, potentially causing a serious system hiccup. The reasons for allowing thread pools to expand is usually to avoid such hiccups.

 

So I agree the intent is not 100% obvious, it seems unlikely this was intended in our case. And some users and readers of the code in question expressed surprise when I explained how it worked.


>
> I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.
>
> > My conclusion so far:
>
> > 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.
>
> I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.

 

I think people who read the code are often mislead. The authors should indeed read the documentation, though I'm not sure they always do.

 

Hans


>
> > 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> > queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> > other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)
>
> Having unequal core and max with other than a SynchronousQueue is not odd at all.
>
> > 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.
>
> Yes it's possible and yes somewhat obscure.
>
> > Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
>
> If there were demand for this ...
>
> Cheers,
> David

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list


On Wed, Mar 20, 2019 at 4:30 PM David Holmes via Concurrency-interest <[hidden email]> wrote:


Obviously you can construct scenarios that are advantaged or disadvantaged by any given policy. If your system has sufficient capacity then adding more threads rather than buffering requests will reduce latency for those extra requests. But if you don’t have sufficient capacity adding more threads may cause resource contention that increases average latency. But if you have the capacity then why not just increase the number of core threads to start with?


A problem is efficient use of threads.  Because new core threads are currently spun up at submit time even when other core threads are idle, you will likely end up with corepoolsize threads even when a much smaller number is good enough most of the time.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
So, each task’s latency is: enqueue wait (contention for queue capacity) + dequeue wait (contention for executor threads) + execution time (pure CPU) + resource wait time (contention between running tasks).

Choose SMALL_NO for optimal latency. Choose LARGE_NO to optimise for throughput. (Probably something like execution time == resource wait time * LARGE_NO)

Choose queue size to switch between latency and throughput. It is not clear why it should switch based on the queue reaching its capacity.


(But the reasoning doesn’t quite make sense. “arrival rate” is the parameter of the workload. “resource” must be provisioned to service such “arrival rate”)

Alex

On 20 Mar 2019, at 23:28, David Holmes via Concurrency-interest <[hidden email]> wrote:

Hi Hans,
 
Obviously you can construct scenarios that are advantaged or disadvantaged by any given policy. If your system has sufficient capacity then adding more threads rather than buffering requests will reduce latency for those extra requests. But if you don’t have sufficient capacity adding more threads may cause resource contention that increases average latency. But if you have the capacity then why not just increase the number of core threads to start with?
 
Cheers,
David
 
From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
Sent: Thursday, March 21, 2019 7:27 AM
To: David Holmes <[hidden email]>
Cc: [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API
 
On Wed, Mar 20, 2019 at 1:48 PM David Holmes <[hidden email]> wrote:

>
> Hi Hans,
>
> > From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> > Sent: Thursday, March 21, 2019 6:15 AM
> > To: [hidden email]
> > Subject: [concurrency-interest] ThreadPoolExecutor API
> >
> > I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
> >
> > new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
> >
> > I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> > more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.
>
> Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:
>
> “The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”
 
David -
 
That's a good way to view it, and I agree that this is not always a disastrous thing to do. But in most cases, especially in a latency-sensitive environment, this is suboptimal. Why wait before launching the extra threads, if there are tasks piling up in the queue? By starting the extra threads earlier, you end up with better latency. Admittedly, it may force you to start some extra temporary threads that you could have avoided starting. But, at least in our case, that very rarely seems to be the right trade-off.
 
This seems particularly dubious for more general purpose shared thread pools, when the tasks may block on IO for extended periods. If I try to execute SMALL_NO long-blocking tasks, and then a 100 short compute-bound ones, everything needlessly blocks until the IO completes, potentially causing a serious system hiccup. The reasons for allowing thread pools to expand is usually to avoid such hiccups.
 
So I agree the intent is not 100% obvious, it seems unlikely this was intended in our case. And some users and readers of the code in question expressed surprise when I explained how it worked.

>
> I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.
>
> > My conclusion so far:
>
> > 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.
>
> I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.
 
I think people who read the code are often mislead. The authors should indeed read the documentation, though I'm not sure they always do.
 
Hans

>
> > 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> > queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> > other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)
>
> Having unequal core and max with other than a SynchronousQueue is not odd at all.
>
> > 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.
>
> Yes it's possible and yes somewhat obscure.
>
> > Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
>
> If there were demand for this ...
>
> Cheers,
> David
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list

The queue size is not being used to switch between latency and throughput. The initial set of workers is deemed sufficient to process the expected arrival rate of tasks, given the workload they represent, to achieve the desired latency (service time). The queue exists to buffer temporary higher arrival rates of tasks, which results in longer latencies without having to reject the extra tasks. But if the queue gets too big then latencies can rise beyond acceptable levels and so that is where you throw more workers at the problem.

 

It's the kind of thing you used to observe in banks all the time (probably before the mid 80’s 😊 ). If all the tellers are busy people start to queue; if the queue gets too big they open more tellers to service the extra requests and ensure the service times are acceptable.

 

David

------

 

 

From: Alex Otenko <[hidden email]>
Sent: Sunday, March 24, 2019 8:10 PM
To: [hidden email]
Cc: Hans Boehm <[hidden email]>; David Holmes <[hidden email]>; [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API

 

So, each task’s latency is: enqueue wait (contention for queue capacity) + dequeue wait (contention for executor threads) + execution time (pure CPU) + resource wait time (contention between running tasks).

 

Choose SMALL_NO for optimal latency. Choose LARGE_NO to optimise for throughput. (Probably something like execution time == resource wait time * LARGE_NO)

 

Choose queue size to switch between latency and throughput. It is not clear why it should switch based on the queue reaching its capacity.

 

 

(But the reasoning doesn’t quite make sense. “arrival rate” is the parameter of the workload. “resource” must be provisioned to service such “arrival rate”)

 

Alex



On 20 Mar 2019, at 23:28, David Holmes via Concurrency-interest <[hidden email]> wrote:

 

Hi Hans,

 

Obviously you can construct scenarios that are advantaged or disadvantaged by any given policy. If your system has sufficient capacity then adding more threads rather than buffering requests will reduce latency for those extra requests. But if you don’t have sufficient capacity adding more threads may cause resource contention that increases average latency. But if you have the capacity then why not just increase the number of core threads to start with?

 

Cheers,

David

 

From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
Sent: Thursday, March 21, 2019 7:27 AM
To: David Holmes <[hidden email]>
Cc: [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API

 

On Wed, Mar 20, 2019 at 1:48 PM David Holmes <[hidden email]> wrote:


>
> Hi Hans,
>
> > From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> > Sent: Thursday, March 21, 2019 6:15 AM
> > To: [hidden email]
> > Subject: [concurrency-interest] ThreadPoolExecutor API
> >
> > I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
> >
> > new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
> >
> > I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> > more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.
>
> Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:
>
> “The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”

 

David -

 

That's a good way to view it, and I agree that this is not always a disastrous thing to do. But in most cases, especially in a latency-sensitive environment, this is suboptimal. Why wait before launching the extra threads, if there are tasks piling up in the queue? By starting the extra threads earlier, you end up with better latency. Admittedly, it may force you to start some extra temporary threads that you could have avoided starting. But, at least in our case, that very rarely seems to be the right trade-off.

 

This seems particularly dubious for more general purpose shared thread pools, when the tasks may block on IO for extended periods. If I try to execute SMALL_NO long-blocking tasks, and then a 100 short compute-bound ones, everything needlessly blocks until the IO completes, potentially causing a serious system hiccup. The reasons for allowing thread pools to expand is usually to avoid such hiccups.

 

So I agree the intent is not 100% obvious, it seems unlikely this was intended in our case. And some users and readers of the code in question expressed surprise when I explained how it worked.


>
> I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.
>
> > My conclusion so far:
>
> > 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.
>
> I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.

 

I think people who read the code are often mislead. The authors should indeed read the documentation, though I'm not sure they always do.

 

Hans


>
> > 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> > queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> > other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)
>
> Having unequal core and max with other than a SynchronousQueue is not odd at all.
>
> > 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.
>
> Yes it's possible and yes somewhat obscure.
>
> > Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
>
> If there were demand for this ...
>
> Cheers,
> David

_______________________________________________

Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

 


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: ThreadPoolExecutor API

JSR166 Concurrency mailing list
It isn’t being used directly, but the size of the queue affects queueing behaviour, which is what triggers the resizing above SMALL_NO: when queue.offer fails, it spawns more workers. I thought if queue.offer fails, it’s a little too late.

Alex

On 24 Mar 2019, at 12:43, David Holmes <[hidden email]> wrote:

The queue size is not being used to switch between latency and throughput. The initial set of workers is deemed sufficient to process the expected arrival rate of tasks, given the workload they represent, to achieve the desired latency (service time). The queue exists to buffer temporary higher arrival rates of tasks, which results in longer latencies without having to reject the extra tasks. But if the queue gets too big then latencies can rise beyond acceptable levels and so that is where you throw more workers at the problem.
 
It's the kind of thing you used to observe in banks all the time (probably before the mid 80’s 😊 ). If all the tellers are busy people start to queue; if the queue gets too big they open more tellers to service the extra requests and ensure the service times are acceptable.
 
David
------
 
 
From: Alex Otenko <[hidden email]> 
Sent: Sunday, March 24, 2019 8:10 PM
To: [hidden email]
Cc: Hans Boehm <[hidden email]>; David Holmes <[hidden email]>; [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API
 
So, each task’s latency is: enqueue wait (contention for queue capacity) + dequeue wait (contention for executor threads) + execution time (pure CPU) + resource wait time (contention between running tasks).
 
Choose SMALL_NO for optimal latency. Choose LARGE_NO to optimise for throughput. (Probably something like execution time == resource wait time * LARGE_NO)
 
Choose queue size to switch between latency and throughput. It is not clear why it should switch based on the queue reaching its capacity.
 
 
(But the reasoning doesn’t quite make sense. “arrival rate” is the parameter of the workload. “resource” must be provisioned to service such “arrival rate”)
 
Alex


On 20 Mar 2019, at 23:28, David Holmes via Concurrency-interest <[hidden email]> wrote:
 
Hi Hans,
 
Obviously you can construct scenarios that are advantaged or disadvantaged by any given policy. If your system has sufficient capacity then adding more threads rather than buffering requests will reduce latency for those extra requests. But if you don’t have sufficient capacity adding more threads may cause resource contention that increases average latency. But if you have the capacity then why not just increase the number of core threads to start with?
 
Cheers,
David
 
From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
Sent: Thursday, March 21, 2019 7:27 AM
To: David Holmes <[hidden email]>
Cc: [hidden email]
Subject: Re: [concurrency-interest] ThreadPoolExecutor API
 
On Wed, Mar 20, 2019 at 1:48 PM David Holmes <[hidden email]> wrote:

>
> Hi Hans,
>
> > From: Concurrency-interest <[hidden email]> On Behalf Of Hans Boehm via Concurrency-interest
> > Sent: Thursday, March 21, 2019 6:15 AM
> > To: [hidden email]
> > Subject: [concurrency-interest] ThreadPoolExecutor API
> >
> > I've seen several ThreadPoolExecutor usages that create a ThreadPoolExecutor with
> >
> > new ThreadPoolExecutor(SMALL_NO, LARGER_NO, ..., ...,  new LinkedBlockingQueue<Runnable>(ABOUT_100.), ...);
> >
> > I'm not 100% sure, but I think the intent is to create a TPE with a small number of core threads; only if those are all busy, temporarily create a few
> > more threads to accommodate bursts; and if that doesn't suffice, enqueue tasks.
>
> Why do you assume that is the intent? You are assuming that people don’t know what they are doing and don’t read the documentation. I have explained the basic operating mode of TPE a number of times over the years as follows:
>
> “The basic threading strategy is based around expected service times and throughput requirements. If you characterise your task workload, identify the arrival rate and determine what your throughput/response-time requirements are then you can determine the necessary number of threads to handle your steady-state workload. The queue is then used to buffer requests when you get transient overloads. By bounding the queue you set a second overload threshhold at which new threads are brought in (up to max) to try and service the overload and get the system back to the expected steady-state.”
 
David -
 
That's a good way to view it, and I agree that this is not always a disastrous thing to do. But in most cases, especially in a latency-sensitive environment, this is suboptimal. Why wait before launching the extra threads, if there are tasks piling up in the queue? By starting the extra threads earlier, you end up with better latency. Admittedly, it may force you to start some extra temporary threads that you could have avoided starting. But, at least in our case, that very rarely seems to be the right trade-off.
 
This seems particularly dubious for more general purpose shared thread pools, when the tasks may block on IO for extended periods. If I try to execute SMALL_NO long-blocking tasks, and then a 100 short compute-bound ones, everything needlessly blocks until the IO completes, potentially causing a serious system hiccup. The reasons for allowing thread pools to expand is usually to avoid such hiccups.
 
So I agree the intent is not 100% obvious, it seems unlikely this was intended in our case. And some users and readers of the code in question expressed surprise when I explained how it worked.

>
> I disagree with Martin that this API didn't work out well, it's a powerful and flexible API but that also brings complexity which seems to overwhelm many users. But then that is why we have the factory methods to produce commonly useful TPE forms.
>
> > My conclusion so far:
>
> > 1) This is a reasonable intent. And if you didn't read the ThreadPoolExecutor documentation very carefully, this seems like a plausible way to express it.
>
> I agree it is not an unreasonable expectation if you don't actually know how it works. I don't think people are generally silly enough to guess what the semantics are instead of actually reading the documentation.
 
I think people who read the code are often mislead. The authors should indeed read the documentation, though I'm not sure they always do.
 
Hans

>
> > 2) This is NOT what the code actually does. It instead aggressively creates a small number of threads; when those are busy, it enqueues further requests. When the
> > queue fills up too, it potentially creates an additional LARGER_NO - SMALL_NO threads, as needed. In fact, having unequal core and max counts with something
> > other than a SynchronousQueue seems odd. (And the current TPE interface makes a lot of sense, especially if you view it as a low-level building block.)
>
> Having unequal core and max with other than a SynchronousQueue is not odd at all.
>
> > 3) It's possible to get something like the intended effect by chaining two TPEs together through the first one's RejectedExecutionHandler, which seems a bit obscure.
>
> Yes it's possible and yes somewhat obscure.
>
> > Questions:  Does this assessment look right? Is there a better way to get the intended effect? If not, should there be?
>
> If there were demand for this ...
>
> Cheers,
> David
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest