A puzzle about recursive functions in CompletableFuture

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

A puzzle about recursive functions in CompletableFuture

Millies, Sebastian
Hello there,

I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

This particular way of doing TRE is not in itself especially worthwhile, because it is almost trivial to transform a tail-recursive function into a loop manually. However, I think it’s still interesting. So, as an example let's use this tail-recursive definition of the Fibonacci series:

public static BigInteger fibTailRecursive(int n) {
    return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE, in which the elementary case and recursive case are wrapped in methods that return a CompletableFuture:

public static BigInteger fibCF(int n) {
        return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
        return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

This is similar to Pierre-Yves Saumont's approach (cf. http://www.javacodegeeks.com/2015/10/stack-safe-recursion-in-java.html). However, his implementation uses custom classes and an explicit while-loop. With TRE I can compute e. g. fibCF(5000), although the recursive version normally encounters a StackOverflowError at around n = 1550 on my machine.

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
        return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
        return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());

(The ThreadFactoryBuilder is from Guava.)

The interesting thing is that the recursive call is inside the lambda, so cannot be seen by the completion handler. It is essential that I use thenComposeAsync() and a dedicated thread. Using either thenCompose() or a current-thread Executor (Executor e = Runnable::run) will lead to the dreaded SOE. So there must be some implicit interaction between the current thread and the dedicated thread that prevents this.

I suspect it may have to do with the fact that the "dummy" future CF is already completed, so the downstream task would by default be completed in the client thread, but is then somehow handed off to the explicitly required dedicated thread, with the "recursive" call being landed in the client thread again. But how would this prevent SOE?

Could anyone explain to me, please, how my coding works?

-- Sebastian Millies




Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Eric Duffaut, Dr. Wolfram Jost, Arnd Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Dr. Andreas Bereczky - http://www.softwareag.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A puzzle about recursive functions in CompletableFuture

Viktor Klang
Isn't this "simply" trampolining on the submission queue of the Executor?

(I informally call it async tailcalls)

On Wed, Dec 2, 2015 at 11:34 AM, Millies, Sebastian <[hidden email]> wrote:
Hello there,

I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

This particular way of doing TRE is not in itself especially worthwhile, because it is almost trivial to transform a tail-recursive function into a loop manually. However, I think it’s still interesting. So, as an example let's use this tail-recursive definition of the Fibonacci series:

public static BigInteger fibTailRecursive(int n) {
    return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE, in which the elementary case and recursive case are wrapped in methods that return a CompletableFuture:

public static BigInteger fibCF(int n) {
        return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
        return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

This is similar to Pierre-Yves Saumont's approach (cf. http://www.javacodegeeks.com/2015/10/stack-safe-recursion-in-java.html). However, his implementation uses custom classes and an explicit while-loop. With TRE I can compute e. g. fibCF(5000), although the recursive version normally encounters a StackOverflowError at around n = 1550 on my machine.

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
        return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
        return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());

(The ThreadFactoryBuilder is from Guava.)

The interesting thing is that the recursive call is inside the lambda, so cannot be seen by the completion handler. It is essential that I use thenComposeAsync() and a dedicated thread. Using either thenCompose() or a current-thread Executor (Executor e = Runnable::run) will lead to the dreaded SOE. So there must be some implicit interaction between the current thread and the dedicated thread that prevents this.

I suspect it may have to do with the fact that the "dummy" future CF is already completed, so the downstream task would by default be completed in the client thread, but is then somehow handed off to the explicitly required dedicated thread, with the "recursive" call being landed in the client thread again. But how would this prevent SOE?

Could anyone explain to me, please, how my coding works?

-- Sebastian Millies




Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Eric Duffaut, Dr. Wolfram Jost, Arnd Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Dr. Andreas Bereczky - http://www.softwareag.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A puzzle about recursive functions in CompletableFuture

Millies, Sebastian

if that were all there is to it, should it not work also when using the current thread executor? What would then be different about the trampolining?

n  Sebastian

 

From: Viktor Klang [mailto:[hidden email]]
Sent: Wednesday, December 02, 2015 11:53 AM
To: Millies, Sebastian
Cc: [hidden email]
Subject: Re: [concurrency-interest] A puzzle about recursive functions in CompletableFuture

 

Isn't this "simply" trampolining on the submission queue of the Executor?

(I informally call it async tailcalls)

 

On Wed, Dec 2, 2015 at 11:34 AM, Millies, Sebastian <[hidden email]> wrote:

Hello there,

I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

This particular way of doing TRE is not in itself especially worthwhile, because it is almost trivial to transform a tail-recursive function into a loop manually. However, I think it’s still interesting. So, as an example let's use this tail-recursive definition of the Fibonacci series:

public static BigInteger fibTailRecursive(int n) {
    return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE, in which the elementary case and recursive case are wrapped in methods that return a CompletableFuture:

public static BigInteger fibCF(int n) {
        return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
        return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

This is similar to Pierre-Yves Saumont's approach (cf. http://www.javacodegeeks.com/2015/10/stack-safe-recursion-in-java.html). However, his implementation uses custom classes and an explicit while-loop. With TRE I can compute e. g. fibCF(5000), although the recursive version normally encounters a StackOverflowError at around n = 1550 on my machine.

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
        return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
        return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());

(The ThreadFactoryBuilder is from Guava.)

The interesting thing is that the recursive call is inside the lambda, so cannot be seen by the completion handler. It is essential that I use thenComposeAsync() and a dedicated thread. Using either thenCompose() or a current-thread Executor (Executor e = Runnable::run) will lead to the dreaded SOE. So there must be some implicit interaction between the current thread and the dedicated thread that prevents this.

I suspect it may have to do with the fact that the "dummy" future CF is already completed, so the downstream task would by default be completed in the client thread, but is then somehow handed off to the explicitly required dedicated thread, with the "recursive" call being landed in the client thread again. But how would this prevent SOE?

Could anyone explain to me, please, how my coding works?

-- Sebastian Millies




Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Eric Duffaut, Dr. Wolfram Jost, Arnd Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Dr. Andreas Bereczky - http://www.softwareag.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



 

--

Cheers,


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A puzzle about recursive functions in CompletableFuture

Viktor Klang
Sadly, as the JVM (still) does not support tail call optimization so it has to grow the stack instead.

On Wed, Dec 2, 2015 at 12:18 PM, Millies, Sebastian <[hidden email]> wrote:

if that were all there is to it, should it not work also when using the current thread executor? What would then be different about the trampolining?

n  Sebastian

 

From: Viktor Klang [mailto:[hidden email]]
Sent: Wednesday, December 02, 2015 11:53 AM
To: Millies, Sebastian
Cc: [hidden email]
Subject: Re: [concurrency-interest] A puzzle about recursive functions in CompletableFuture

 

Isn't this "simply" trampolining on the submission queue of the Executor?

(I informally call it async tailcalls)

 

On Wed, Dec 2, 2015 at 11:34 AM, Millies, Sebastian <[hidden email]> wrote:

Hello there,

I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

This particular way of doing TRE is not in itself especially worthwhile, because it is almost trivial to transform a tail-recursive function into a loop manually. However, I think it’s still interesting. So, as an example let's use this tail-recursive definition of the Fibonacci series:

public static BigInteger fibTailRecursive(int n) {
    return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE, in which the elementary case and recursive case are wrapped in methods that return a CompletableFuture:

public static BigInteger fibCF(int n) {
        return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
        return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

This is similar to Pierre-Yves Saumont's approach (cf. http://www.javacodegeeks.com/2015/10/stack-safe-recursion-in-java.html). However, his implementation uses custom classes and an explicit while-loop. With TRE I can compute e. g. fibCF(5000), although the recursive version normally encounters a StackOverflowError at around n = 1550 on my machine.

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
        return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
        return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());

(The ThreadFactoryBuilder is from Guava.)

The interesting thing is that the recursive call is inside the lambda, so cannot be seen by the completion handler. It is essential that I use thenComposeAsync() and a dedicated thread. Using either thenCompose() or a current-thread Executor (Executor e = Runnable::run) will lead to the dreaded SOE. So there must be some implicit interaction between the current thread and the dedicated thread that prevents this.

I suspect it may have to do with the fact that the "dummy" future CF is already completed, so the downstream task would by default be completed in the client thread, but is then somehow handed off to the explicitly required dedicated thread, with the "recursive" call being landed in the client thread again. But how would this prevent SOE?

Could anyone explain to me, please, how my coding works?

-- Sebastian Millies




Software AG – Sitz/Registered office: Uhlandstraße 12, 64297 Darmstadt, Germany – Registergericht/Commercial register: Darmstadt HRB 1562 - Vorstand/Management Board: Karl-Heinz Streibich (Vorsitzender/Chairman), Eric Duffaut, Dr. Wolfram Jost, Arnd Zinnhardt; - Aufsichtsratsvorsitzender/Chairman of the Supervisory Board: Dr. Andreas Bereczky - http://www.softwareag.com


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



 

--

Cheers,




--
Cheers,

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A puzzle about recursive functions in CompletableFuture

Aaron Grunthal
In reply to this post by Millies, Sebastian
That depends how the current-thread-executor is implemented. If the
execute method always executes the submitted tasks down-stack then it'll
lead to a SOE. If it uses an internal queue when it knows that there is
a loop polling tasks from the queue somewhere up-stack then you
effectively have trampolining implemented in the executor.

On 02.12.2015 12:18, Millies, Sebastian wrote:
> if that were all there is to it, should it not work also when using the
> current thread executor? What would then be different about the
> trampolining?

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A puzzle about recursive functions in CompletableFuture

Doug Lea
In reply to this post by Millies, Sebastian
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:

> I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.
>
> public static BigInteger fibTailRecursive(int n) {
>      return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
> }
>
> private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
>      return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
> }
>
> And here is the "optimized" version with TRE,
>
> public static BigInteger fibCF(int n) {
>          return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
> }
>
> private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
>          return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
> }
>
> Finally, here are my problematic implementations of terminate() and tailcall():
>
> public static <T> CompletableFuture<T> terminate(T t) {
>          return completedFuture(t);
> }
>
> public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
>          return CF.thenComposeAsync(_x -> s.get(), THREAD);
> }

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)

>
> private static final CompletableFuture<?> CF = completedFuture(null);
> private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());
>

This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Default Stack Size on 64-bit JVMs

Nathan Reynolds-2
> In the mean time, there is some sentiment that default thread stack sizes on 64bit JVMs should be expanded keep up with the times. On 32bit systems the 1MB limit is completely defensible. But expanding to say 64MB on 64bit systems would reduce practical encounters with SOE in these kinds of constructions by a factor of 64 or so.

I've had this debate on a C++ application as well.  The debate started because a lot of people were working on stack overflow issues.  Each time I would tell them to first fix the infinite recursion issue.  They would try to take the easy route and would set the stack size to something very large and hope the problem would go away.  It didn't.  After the recursion bugs were eliminated, we got to the point where the application really did need to do that much recursion and the recursion was acceptably finite.  We then went with larger stack sizes and chose a sane default size.  After changing the default size, we bumped into a couple of problems.  Nothing that says we shouldn't increase the stack size but maybe it should be documented.

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

Second, it can take a while to overflow the 1 MB stack.  I've had a couple of situations where it took several minutes in Java to cause a stack overflow.  Obviously, the program isn't just doing recursive calls.  There is a computation or GC happening all the way down the stack.  A 64 MB stack would take a couple of hours.  This means the thread is busy for a long time and other operations will *hopefully* timeout waiting for it.

I think a smart approach would be to examine code (open source?) or production deployments and see how often the stack sizing parameter is used and what value it is set to.  This will give us a good idea if the default stack size needs to be adjusted and what is a good stack size to use.
-Nathan
On 12/2/2015 5:52 AM, Doug Lea wrote:
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:
I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

public static BigInteger fibTailRecursive(int n) {
     return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
     return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE,

public static BigInteger fibCF(int n) {
         return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
         return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
         return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
         return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)


private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());


This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Vitaly Davidovich
On linux, thread stacks count in commit charge, even if some/most of the VA space is unused.  If you disable overcommit and bump stack size significantly, you may start to hit OOM issues even on 64bit machines if you create enough threads.

C++ stack usage is expected to be higher than Java since C++ apps stack alloc a lot more.

On Wed, Dec 2, 2015 at 12:40 PM, Nathan Reynolds <[hidden email]> wrote:
> In the mean time, there is some sentiment that default thread stack sizes on 64bit JVMs should be expanded keep up with the times. On 32bit systems the 1MB limit is completely defensible. But expanding to say 64MB on 64bit systems would reduce practical encounters with SOE in these kinds of constructions by a factor of 64 or so.

I've had this debate on a C++ application as well.  The debate started because a lot of people were working on stack overflow issues.  Each time I would tell them to first fix the infinite recursion issue.  They would try to take the easy route and would set the stack size to something very large and hope the problem would go away.  It didn't.  After the recursion bugs were eliminated, we got to the point where the application really did need to do that much recursion and the recursion was acceptably finite.  We then went with larger stack sizes and chose a sane default size.  After changing the default size, we bumped into a couple of problems.  Nothing that says we shouldn't increase the stack size but maybe it should be documented.

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

Second, it can take a while to overflow the 1 MB stack.  I've had a couple of situations where it took several minutes in Java to cause a stack overflow.  Obviously, the program isn't just doing recursive calls.  There is a computation or GC happening all the way down the stack.  A 64 MB stack would take a couple of hours.  This means the thread is busy for a long time and other operations will *hopefully* timeout waiting for it.

I think a smart approach would be to examine code (open source?) or production deployments and see how often the stack sizing parameter is used and what value it is set to.  This will give us a good idea if the default stack size needs to be adjusted and what is a good stack size to use.
-Nathan
On 12/2/2015 5:52 AM, Doug Lea wrote:
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:
I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

public static BigInteger fibTailRecursive(int n) {
     return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
     return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE,

public static BigInteger fibCF(int n) {
         return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
         return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
         return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
         return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)


private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());


This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Hanson Char
In reply to this post by Nathan Reynolds-2
Has it been considered to eliminate the use of recursion entirely from the concurrency library (or for that matter the JDK) ?  Any need for recursion can be readily transformed into the use of a stack.  This doesn't help prevent anyone abusing the library, but would eliminate the need to explain StackOverFlow error.  Shift the usage responsibility to the caller.

2 cents.

Hanson 

On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:
> In the mean time, there is some sentiment that default thread stack sizes on 64bit JVMs should be expanded keep up with the times. On 32bit systems the 1MB limit is completely defensible. But expanding to say 64MB on 64bit systems would reduce practical encounters with SOE in these kinds of constructions by a factor of 64 or so.

I've had this debate on a C++ application as well.  The debate started because a lot of people were working on stack overflow issues.  Each time I would tell them to first fix the infinite recursion issue.  They would try to take the easy route and would set the stack size to something very large and hope the problem would go away.  It didn't.  After the recursion bugs were eliminated, we got to the point where the application really did need to do that much recursion and the recursion was acceptably finite.  We then went with larger stack sizes and chose a sane default size.  After changing the default size, we bumped into a couple of problems.  Nothing that says we shouldn't increase the stack size but maybe it should be documented.

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

Second, it can take a while to overflow the 1 MB stack.  I've had a couple of situations where it took several minutes in Java to cause a stack overflow.  Obviously, the program isn't just doing recursive calls.  There is a computation or GC happening all the way down the stack.  A 64 MB stack would take a couple of hours.  This means the thread is busy for a long time and other operations will *hopefully* timeout waiting for it.

I think a smart approach would be to examine code (open source?) or production deployments and see how often the stack sizing parameter is used and what value it is set to.  This will give us a good idea if the default stack size needs to be adjusted and what is a good stack size to use.
-Nathan
On 12/2/2015 5:52 AM, Doug Lea wrote:
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:
I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

public static BigInteger fibTailRecursive(int n) {
     return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
     return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE,

public static BigInteger fibCF(int n) {
         return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
         return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
         return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
         return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)


private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());


This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

thurstonn
Why  would you do this?
Recursion is one of the most beautiful tools in software, and TCO has been well understood for what? since the '70s?
If ever there was a "problem" that should be pushed to the compiler/runtime, it's TCO.

I thought I read/saw that Brian Goetz mentioning that the JVM security implementation had finally been refactored out of the JVM and was no longer an impediment
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Jorge Branco
In reply to this post by Hanson Char
Has it been considered to eliminate the use of recursion entirely from the concurrency library (or for that matter the JDK) ?  Any need for recursion can be readily transformed into the use of a stack.

Wouldn't that entail a huge performance cost? Some very raw experiments I've done in the past seemed to imply that a (Java) stack would give a terrible performance in comparison with equivalent algorithms using plain recursion.

On Wed, Dec 2, 2015 at 7:29 PM, Hanson Char <[hidden email]> wrote:
Has it been considered to eliminate the use of recursion entirely from the concurrency library (or for that matter the JDK) ?  Any need for recursion can be readily transformed into the use of a stack.  This doesn't help prevent anyone abusing the library, but would eliminate the need to explain StackOverFlow error.  Shift the usage responsibility to the caller.

2 cents.

Hanson 

On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:
> In the mean time, there is some sentiment that default thread stack sizes on 64bit JVMs should be expanded keep up with the times. On 32bit systems the 1MB limit is completely defensible. But expanding to say 64MB on 64bit systems would reduce practical encounters with SOE in these kinds of constructions by a factor of 64 or so.

I've had this debate on a C++ application as well.  The debate started because a lot of people were working on stack overflow issues.  Each time I would tell them to first fix the infinite recursion issue.  They would try to take the easy route and would set the stack size to something very large and hope the problem would go away.  It didn't.  After the recursion bugs were eliminated, we got to the point where the application really did need to do that much recursion and the recursion was acceptably finite.  We then went with larger stack sizes and chose a sane default size.  After changing the default size, we bumped into a couple of problems.  Nothing that says we shouldn't increase the stack size but maybe it should be documented.

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

Second, it can take a while to overflow the 1 MB stack.  I've had a couple of situations where it took several minutes in Java to cause a stack overflow.  Obviously, the program isn't just doing recursive calls.  There is a computation or GC happening all the way down the stack.  A 64 MB stack would take a couple of hours.  This means the thread is busy for a long time and other operations will *hopefully* timeout waiting for it.

I think a smart approach would be to examine code (open source?) or production deployments and see how often the stack sizing parameter is used and what value it is set to.  This will give us a good idea if the default stack size needs to be adjusted and what is a good stack size to use.
-Nathan
On 12/2/2015 5:52 AM, Doug Lea wrote:
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:
I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

public static BigInteger fibTailRecursive(int n) {
     return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
     return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE,

public static BigInteger fibCF(int n) {
         return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
         return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
         return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
         return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)


private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());


This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Martin Buchholz-3
In reply to this post by Nathan Reynolds-2


On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

The assumption is that the VM implementers can manage to spend _only_ address space.  Maybe mmap with PROT_NONE, maybe madvise(DONT_USE), whatever it takes to convince the OS (and ps!) that you're not really using that memory ... yet 

... and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Vitaly Davidovich
and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

IIRC, Rust moved away from segmented stacks.  Go I guess still uses them? It's likely a performance loss for ordinary threading, but would be required for coroutine/fiber support (like Go has). 

On Wed, Dec 2, 2015 at 7:12 PM, Martin Buchholz <[hidden email]> wrote:


On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

The assumption is that the VM implementers can manage to spend _only_ address space.  Maybe mmap with PROT_NONE, maybe madvise(DONT_USE), whatever it takes to convince the OS (and ps!) that you're not really using that memory ... yet 

... and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Martin Buchholz-3
On Wed, Dec 2, 2015 at 4:45 PM, Vitaly Davidovich <[hidden email]> wrote:
>> and of course start working on the difficult task of being able to move
>> the stack so that if you run out of room, simply realloc and move elsewhere,
>> like the competition is doing.
>
>
> IIRC, Rust moved away from segmented stacks.  Go I guess still uses them?
> It's likely a performance loss for ordinary threading, but would be required
> for coroutine/fiber support (like Go has).

You're right!  Wishful thinking.  Go has moved away from segmented
stacks, but tries to resize them as needed.  They need to keep track
of every pointer that could ever point inside a stack.

This certainly weakens my case.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Henrik Johansson-8

However Go does it nicely and the concept of small growable stacks is in itself very appealing. Wouldn't it help many use cases like parallel streams of small computational units as well not just recursion?


On Thu, Dec 3, 2015, 05:53 Martin Buchholz <[hidden email]> wrote:
On Wed, Dec 2, 2015 at 4:45 PM, Vitaly Davidovich <[hidden email]> wrote:
>> and of course start working on the difficult task of being able to move
>> the stack so that if you run out of room, simply realloc and move elsewhere,
>> like the competition is doing.
>
>
> IIRC, Rust moved away from segmented stacks.  Go I guess still uses them?
> It's likely a performance loss for ordinary threading, but would be required
> for coroutine/fiber support (like Go has).

You're right!  Wishful thinking.  Go has moved away from segmented
stacks, but tries to resize them as needed.  They need to keep track
of every pointer that could ever point inside a stack.

This certainly weakens my case.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Alex Otenko
In reply to this post by Vitaly Davidovich
For my benefit, what is a segmented stack?

Alex

On 3 Dec 2015, at 00:45, Vitaly Davidovich <[hidden email]> wrote:

and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

IIRC, Rust moved away from segmented stacks.  Go I guess still uses them? It's likely a performance loss for ordinary threading, but would be required for coroutine/fiber support (like Go has). 

On Wed, Dec 2, 2015 at 7:12 PM, Martin Buchholz <[hidden email]> wrote:


On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

The assumption is that the VM implementers can manage to spend _only_ address space.  Maybe mmap with PROT_NONE, maybe madvise(DONT_USE), whatever it takes to convince the OS (and ps!) that you're not really using that memory ... yet 

... and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Alex Otenko
In reply to this post by Jorge Branco
What’s the difference between “plain recursion” and “Java stack”?

Alex

On 2 Dec 2015, at 20:25, Jorge Branco <[hidden email]> wrote:

Has it been considered to eliminate the use of recursion entirely from the concurrency library (or for that matter the JDK) ?  Any need for recursion can be readily transformed into the use of a stack.

Wouldn't that entail a huge performance cost? Some very raw experiments I've done in the past seemed to imply that a (Java) stack would give a terrible performance in comparison with equivalent algorithms using plain recursion.

On Wed, Dec 2, 2015 at 7:29 PM, Hanson Char <[hidden email]> wrote:
Has it been considered to eliminate the use of recursion entirely from the concurrency library (or for that matter the JDK) ?  Any need for recursion can be readily transformed into the use of a stack.  This doesn't help prevent anyone abusing the library, but would eliminate the need to explain StackOverFlow error.  Shift the usage responsibility to the caller.

2 cents.

Hanson 

On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:
> In the mean time, there is some sentiment that default thread stack sizes on 64bit JVMs should be expanded keep up with the times. On 32bit systems the 1MB limit is completely defensible. But expanding to say 64MB on 64bit systems would reduce practical encounters with SOE in these kinds of constructions by a factor of 64 or so.

I've had this debate on a C++ application as well.  The debate started because a lot of people were working on stack overflow issues.  Each time I would tell them to first fix the infinite recursion issue.  They would try to take the easy route and would set the stack size to something very large and hope the problem would go away.  It didn't.  After the recursion bugs were eliminated, we got to the point where the application really did need to do that much recursion and the recursion was acceptably finite.  We then went with larger stack sizes and chose a sane default size.  After changing the default size, we bumped into a couple of problems.  Nothing that says we shouldn't increase the stack size but maybe it should be documented.

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

Second, it can take a while to overflow the 1 MB stack.  I've had a couple of situations where it took several minutes in Java to cause a stack overflow.  Obviously, the program isn't just doing recursive calls.  There is a computation or GC happening all the way down the stack.  A 64 MB stack would take a couple of hours.  This means the thread is busy for a long time and other operations will *hopefully* timeout waiting for it.

I think a smart approach would be to examine code (open source?) or production deployments and see how often the stack sizing parameter is used and what value it is set to.  This will give us a good idea if the default stack size needs to be adjusted and what is a good stack size to use.
-Nathan
On 12/2/2015 5:52 AM, Doug Lea wrote:
On 12/02/2015 05:34 AM, Millies, Sebastian wrote:
I have thought of a way of doing tail-recursion elimination with CompletableFuture. It works, but I do not understand why. There must be some inter-thread communication going on about which I have not been able to find documentation. Perhaps someone here can enlighten me.

public static BigInteger fibTailRecursive(int n) {
     return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
     return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

And here is the "optimized" version with TRE,

public static BigInteger fibCF(int n) {
         return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
         return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

Finally, here are my problematic implementations of terminate() and tailcall():

public static <T> CompletableFuture<T> terminate(T t) {
         return completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
         return CF.thenComposeAsync(_x -> s.get(), THREAD);
}

(Using default Async would work at least as well.
Aside: using direct executors (run in the same thread as caller)
is almost never a good idea with CompletableFutures.)


private static final CompletableFuture<?> CF = completedFuture(null);
private static final Executor THREAD = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());


This is a variation of the issues brought up by Chris Purcell a few months
ago. In async mode, CF generally avoids StackOverflowError (SOE),
because threads help clear out dependent tasks, which they do anyway to
maximize parallelism.

In non-async mode, you can still get SOE because java compilers do not
loopify tail recursion. It would be possible to better avoid this
kind of non-async SOE by thread-local trampolining at the expense
of reducing parallelism, so we don't want to do it.

It's still an open question whether we can find some scheme that
combines both advantages.

In the mean time, there is some sentiment that default thread stack
sizes on 64bit JVMs should be expanded keep up with the times. On
32bit systems the 1MB limit is completely defensible. But expanding
to say 64MB on 64bit systems would reduce practical encounters with
SOE in these kinds of constructions by a factor of 64 or so.

-Doug


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Andrew Haley
In reply to this post by Jorge Branco
On 02/12/15 20:25, Jorge Branco wrote:
>> Has it been considered to eliminate the use of recursion entirely from
>> the concurrency library (or for that matter the JDK) ?  Any need for
>> recursion can be readily transformed into the use of a stack.
>
> Wouldn't that entail a huge performance cost? Some very raw experiments
> I've done in the past seemed to imply that a (Java) stack would give a
> terrible performance in comparison with equivalent algorithms using plain
> recursion.

I can't immediately see any reason it should.  There's no way to tell
for sure without real code.

Andrew.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

David M. Lloyd-3
On 12/07/2015 04:40 AM, Andrew Haley wrote:

> On 02/12/15 20:25, Jorge Branco wrote:
>>> Has it been considered to eliminate the use of recursion entirely from
>>> the concurrency library (or for that matter the JDK) ?  Any need for
>>> recursion can be readily transformed into the use of a stack.
>>
>> Wouldn't that entail a huge performance cost? Some very raw experiments
>> I've done in the past seemed to imply that a (Java) stack would give a
>> terrible performance in comparison with equivalent algorithms using plain
>> recursion.
>
> I can't immediately see any reason it should.  There's no way to tell
> for sure without real code.

Using ArrayDeque instead of Stack would be a good first step, as the
latter uses synchronization everywhere.  Also, proper initial sizing is
very important to performance for large operations to avoid copying.
--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Default Stack Size on 64-bit JVMs

Vitaly Davidovich
In reply to this post by Alex Otenko

http://llvm.org/releases/3.0/docs/SegmentedStacks.html

Rust discussion on them: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html

sent from my phone

On Dec 7, 2015 4:39 AM, "Alex Otenko" <[hidden email]> wrote:
For my benefit, what is a segmented stack?

Alex

On 3 Dec 2015, at 00:45, Vitaly Davidovich <[hidden email]> wrote:

and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

IIRC, Rust moved away from segmented stacks.  Go I guess still uses them? It's likely a performance loss for ordinary threading, but would be required for coroutine/fiber support (like Go has). 

On Wed, Dec 2, 2015 at 7:12 PM, Martin Buchholz <[hidden email]> wrote:


On Wed, Dec 2, 2015 at 9:40 AM, Nathan Reynolds <[hidden email]> wrote:

First, the larger stack size eats up address space.  As far as resources go, it is very cheap at this point in time on a 64-bit machine.  25 years ago we said the same thing about virtual address space on 32-bit machines.  We've got about 60 years before we will start exhausting the 64-bit address space.  So, let's ignore that for now.  Today's problem comes when doing sizing calculations.  People will see the address space size and figure that is what is needed for the process in terms of RAM.  Then the entire sizing calculation goes haywire and machines end up over sized.  As a cloud provider, I don't mind people spending more money on RAM they aren't using.  :)

The assumption is that the VM implementers can manage to spend _only_ address space.  Maybe mmap with PROT_NONE, maybe madvise(DONT_USE), whatever it takes to convince the OS (and ps!) that you're not really using that memory ... yet 

... and of course start working on the difficult task of being able to move the stack so that if you run out of room, simply realloc and move elsewhere, like the competition is doing.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
12