CompletableFuture improvements

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

CompletableFuture improvements

Chris Purcell
Hi everyone,

I have a few suggestions for changes that could be made to CompletableFuture, and wondered what everyone thought about them. To prove they work, I have coded them up into an alternative CompletionStage implementation, which you can find (along with a longer discussion) at https://github.com/cjp39/j8stages

To summarize:

(1) Internally, CompletableFuture's fluent API constructs a work queue of callbacks to execute, to avoid StackOverflowErrors caused by callback recursion, but there is no way to add custom tasks to it. If we create a ThreadLocal work queue instead, we can transform recursive listener callbacks into iteration even if arbitrary user code intervenes. This lets the user interoperate with other future implementations (like ListenableFutures), and use complete()/completeExceptionally() in callbacks, without risking StackOverflowErrors. It also simplifies CompletableFuture, and removes the requirement that other CompletionStages implement toCompletableFuture(). Since ThreadLocal is tied into the JVM, it's pretty efficient, so performance shouldn't be an issue. (For implementation details, see the GitHub repo.)

What do people think of this idea?

(2) CompletionStage.whenComplete discards exceptions if both input stage and action fail. It would be less surprising if, like try/finally, it added the action exception to the input exception as a suppressed exception. This can be done safely by cloning the input exception (all Throwables are Serializable). I don't think performance should be a blocker, as this is a rare edge case, and we are providing a very useful service for the cost.

Are there any other issues with doing this that I haven't thought about?

(3) CompletableFuture uses concurrent assistance to execute its callback queue. I suspect this is to give a parallelism speedup. However, I don't think this particular problem will scale. If we drop the assistance algorithm, we can drop the extra queue field and store callbacks in the main result field (currently set to null before the future completes), giving a simpler implementation, and avoiding overhead from cacheline conflicts. (Again, for implementation details, see the GitHub repo.)

My assumption may be way off-base here, though. Are there other reasons for the two-field approach of CompletableFuture?

Cheers,
Chris


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Doug Lea
On 08/14/2015 06:48 AM, Chris Purcell wrote:
>
> I have a few suggestions for changes that could be made to CompletableFuture,

Thanks! Answering out of order...

> (2) CompletionStage.whenComplete discards exceptions if both input stage and
> action fail. It would be less surprising if, like try/finally, it added the
> action exception to the input exception as a suppressed exception.

This seems like a good idea, that we would have done if it had
crossed anyone's mind!  And it can be done now without any
loss of compatibility. Pending any near-term objections, I'll
try incorporating this.

>
> (1) Internally, CompletableFuture's fluent API constructs a work queue of
> callbacks to execute, to avoid StackOverflowErrors caused by callback
> recursion, but there is no way to add custom tasks to it.

I think I'm missing the underlying functionality requirement here.
Can you sketch out some use cases, and how they should be handled?

>
> (3) CompletableFuture uses concurrent assistance to execute its callback
> queue.

Which some users do exploit by creating multiple independent continuation
tasks. (In fact CompletableFuture is computationally complete as a
concurrency primitive -- you can in principle write any concurrent
program only using it). We can't remove this functionality. But
it would be fine to have other CompletionStage implementation classes
that don't support branching parallelism. Especially one that helps
bridge guava ListenableFuture.

>
> My assumption may be way off-base here, though. Are there other reasons for
> the two-field approach of CompletableFuture?

Notice that Signallers for threads blocked waiting for completion are
also held in the linked stack. There may be some good alternative.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Martin Buchholz-3
In reply to this post by Chris Purcell
A lot of effort has been put into writing tests for CompletableFuture in the jsr166 CVS.
I suggest copying your version of CompletableFuture into that and trying "ant tck".  
(some config required; see "ant -p")


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Chris Purcell
In reply to this post by Doug Lea
Thanks, Doug!

This seems like a good idea, that we would have done if it had
crossed anyone's mind!  And it can be done now without any
loss of compatibility. Pending any near-term objections, I'll
try incorporating this.

Fantastic, thanks!
 
(1) Internally, CompletableFuture's fluent API constructs a work queue of
callbacks to execute, to avoid StackOverflowErrors caused by callback
recursion, but there is no way to add custom tasks to it.

I think I'm missing the underlying functionality requirement here.
Can you sketch out some use cases, and how they should be handled

I may be way off-base on why CompletableFuture has a work queue of callbacks. I assumed it was because, if you have a long chain (a couple of thousand) of dependent futures, and you call dependent callbacks recursively within the upstream future's completion handler, then the call stack will overflow. You can easily demonstrate this with CompletableFutures:

    @Test
    public void recursive_callbacks_overflow() {
        CompletableFuture<Integer> head = new CompletableFuture<>();
        CompletableFuture<Integer> tail = head;
        for (int i = 0; i < 2_000; i++) {

            CompletableFuture<Integer> newTail = new CompletableFuture<>();
            tail.thenAccept(v -> newTail.complete(v + 1));
            tail = newTail;
        }
        head.complete(7);
        // This should return 2_007, but the stack overflows during
        // the recursion, the error is silently discarded, and the
        // downstream futures remain unset.
        assertEquals(-1, (int) tail.getNow(-1));
    }

However, thanks to the way CompletableFuture uses a work stack internally, if you use the fluent API, this is not an issue even for hundreds of thousands of dependent stages:

    @Test
    public void a_hundred_thousand_dependent_completion_stages() {
        CompletableFuture<Integer> head = new CompletableFuture<>();
        CompletableFuture<Integer> tail = head;
        for (int i = 0; i < 100_000; i++) {
            tail = tail.thenApply(v -> v + 1);
        }
        head.complete(7);
        assertEquals(100_007, (int) tail.getNow(-1));
    }

Is it merely fortuitous the fluent API has this feature? If so, my apologies for coming out of left-field :) I assumed it was intentional, then figured out how to get the first example to work without overflows:

    @Test
    public void a_hundred_thousand_recursive_style_callbacks_with_MyFuture() {
        MyFuture<Integer> head = new MyFuture<>();
        MyFuture<Integer> tail = head;
        for (int i = 0; i < 100_000; i++) {
            MyFuture<Integer> newTail = new MyFuture<>();
            tail.thenAccept(v -> newTail.complete(v + 1));
            tail = newTail;
        }
        head.complete(7);
        assertEquals(100_007, (int) tail.getNow(-1));
    }

Aside from silly examples like this, calling complete from callbacks is necessary for things like
  1. wrapping non-CompletableFuture futures, like arbitrary CompletionStages, or ListenableFutures
  2. implementing thenCombine to short-circuit when either input fails
It would be great if we could get this functionality in CompletableFuture.

(3) CompletableFuture uses concurrent assistance to execute its callback
queue.

Which some users do exploit by creating multiple independent continuation
tasks. We can't remove this functionality. 

I'm a bit surprised this isn't viewed as the job of the executor, but as you say, it can't be removed now.

Thanks!
Chris 

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Doug Lea
On 08/17/2015 02:27 PM, Chris Purcell wrote:

> I may be way off-base on /why/ CompletableFuture has a work queue of callbacks.
> I assumed it was because, if you have a long chain (a couple of thousand) of
> dependent futures, and you call dependent callbacks recursively within the
> upstream future's completion handler, then the call stack will overflow. You can
> easily demonstrate this with CompletableFutures:

CompletableFuture uses a Treiber-stack-like completion list for
several related reasons: it is non-blocking, atomically resolves
races (like when adding to a CF that is in the midst of completing),
allows helping across threads, and helps avoids stack-overflow
(especially with the 8u update to do so even across threads).

As discussed last year, we'd like to catch as many of these
cases as possible, because StackOverflowError is otherwise
so problematic on JVMs -- some are never caught or handled.
This is not special to completion designs but more common with them.
(Aside: it is an odd state of affairs that you need to emulate
stacks on heaps, with greater peak total memory overhead,
to avoid stack overflow.) Without some JVM-level tail-recursion
support (and maybe even with it), the options are limited in cases
where the recursion occurs in the bodies of lambdas where we
cannot see it. That's the main difference in your two examples:

               CompletableFuture<Integer> newTail = new CompletableFuture<>();
               tail.thenAccept(v -> newTail.complete(v + 1));
               tail = newTail;
vs

               tail = tail.thenApply(v -> v + 1);

We can "trampoline" thenAccept and thenApply, but we don't
even see the newTail.complete(v + 1) because it is inside a
lambda. On the other hand, if the first usage were async, the
cross-thread mechanisms kick in to prevent SOE:

     public void async_recursive_callbacks() {
         CompletableFuture<Integer> head = new CompletableFuture<>();
         CompletableFuture<Integer> tail = head;
         for (int i = 0; i < 2_000; i++) {
             CompletableFuture<Integer> newTail = new CompletableFuture<>();
             tail.thenAcceptAsync(v -> newTail.complete(v + 1));
             tail = newTail;
         }
         head.complete(7);
         assertEquals(2007, (int) tail.join());
     }

It is possible that we could do better here. I'll investigate.
Your thread-local back-channel approach handles some cases,
but I don't think can coexist with other support.

-Doug
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Chris Purcell
Thanks for the context, Doug. It seems like cross-thread helping is the only thing my design loses (it's still non-blocking, atomic and avoids stack-overflow), which is nice to confirm in case I do decide my library is worth releasing properly.

I'm pretty confident the thread-local back-channel approach can be adapted to "pull up" work to the thread's current CompletableFuture handler without throwing away the trampoline/assistance feature. I look forward to hearing if it works out for you!

Cheers,
Chris


P.S. java.util.concurrent has been a great example of cutting-edge concurrent algorithms in the real world since my PhD, a horrifyingly long time ago. Thanks for all your inspiring work!

On Tue, 18 Aug 2015 at 14:14 Doug Lea <[hidden email]> wrote:
On 08/17/2015 02:27 PM, Chris Purcell wrote:

> I may be way off-base on /why/ CompletableFuture has a work queue of callbacks.
> I assumed it was because, if you have a long chain (a couple of thousand) of
> dependent futures, and you call dependent callbacks recursively within the
> upstream future's completion handler, then the call stack will overflow. You can
> easily demonstrate this with CompletableFutures:

CompletableFuture uses a Treiber-stack-like completion list for
several related reasons: it is non-blocking, atomically resolves
races (like when adding to a CF that is in the midst of completing),
allows helping across threads, and helps avoids stack-overflow
(especially with the 8u update to do so even across threads).

As discussed last year, we'd like to catch as many of these
cases as possible, because StackOverflowError is otherwise
so problematic on JVMs -- some are never caught or handled.
This is not special to completion designs but more common with them.
(Aside: it is an odd state of affairs that you need to emulate
stacks on heaps, with greater peak total memory overhead,
to avoid stack overflow.) Without some JVM-level tail-recursion
support (and maybe even with it), the options are limited in cases
where the recursion occurs in the bodies of lambdas where we
cannot see it. That's the main difference in your two examples:

               CompletableFuture<Integer> newTail = new CompletableFuture<>();
               tail.thenAccept(v -> newTail.complete(v + 1));
               tail = newTail;
vs

               tail = tail.thenApply(v -> v + 1);

We can "trampoline" thenAccept and thenApply, but we don't
even see the newTail.complete(v + 1) because it is inside a
lambda. On the other hand, if the first usage were async, the
cross-thread mechanisms kick in to prevent SOE:

     public void async_recursive_callbacks() {
         CompletableFuture<Integer> head = new CompletableFuture<>();
         CompletableFuture<Integer> tail = head;
         for (int i = 0; i < 2_000; i++) {
             CompletableFuture<Integer> newTail = new CompletableFuture<>();
             tail.thenAcceptAsync(v -> newTail.complete(v + 1));
             tail = newTail;
         }
         head.complete(7);
         assertEquals(2007, (int) tail.join());
     }

It is possible that we could do better here. I'll investigate.
Your thread-local back-channel approach handles some cases,
but I don't think can coexist with other support.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: CompletableFuture improvements

Martin Buchholz-3
In reply to this post by Chris Purcell


On Fri, Aug 14, 2015 at 3:48 AM, Chris Purcell <[hidden email]> wrote:

(2) CompletionStage.whenComplete discards exceptions if both input stage and action fail. It would be less surprising if, like try/finally, it added the action exception to the input exception as a suppressed exception. This can be done safely by cloning the input exception (all Throwables are Serializable). I don't think performance should be a blocker, as this is a rare edge case, and we are providing a very useful service for the cost.

Are there any other issues with doing this that I haven't thought about?

Very belatedly, we now add the action exception to the input exception as a suppressed exception in jsr166 CVS, and this micro-feature should appear later in jdk9.  Thanks!

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest