Per-thread per-instance ThreadLocal data

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
We hit a serious performance degradation when we used too many
ThreadLocal instances in a process, as a consequence of which we are
avoiding using instance-ThreadLocal(s), but we are looking for an
alternative pattern for caching sub-computation state.

Would:
   static ThreadLocal<WeakHashMap<Instance, Value>>
be a good pattern?

What other observations or suggestions do people have?

S.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
I would suggest a custom ThreadFactory. I believe the following code is much faster than ThreadLocals:

MyAppThread t = (MyAppThread) Thread.currentThread();
t.getSomething()

On Mon, Oct 15, 2018 at 6:27 AM Shevek via Concurrency-interest <[hidden email]> wrote:
We hit a serious performance degradation when we used too many
ThreadLocal instances in a process, as a consequence of which we are
avoiding using instance-ThreadLocal(s), but we are looking for an
alternative pattern for caching sub-computation state.

Would:
   static ThreadLocal<WeakHashMap<Instance, Value>>
be a good pattern?

What other observations or suggestions do people have?

S.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Subclassing Thread invalidates the use of the security manager.  If there is thread specific context, but it into a class that implements Runnable.  You can then, already have view of this other class in its Runnable implementation. And have a single object to store as a thread local instead of many different values.  

When you need more optimal execution, pass this Runnable class reference (implementing another interface or as a value class instance) in your API design.

Also note that you can also use a worker pattern which is constructed with this object and the invoked with the thread do that the context is always present without the Runnable implementation class owning that logic.

Gregg

On Oct 15, 2018, at 10:31 AM, Andrey Pavlenko via Concurrency-interest <[hidden email]> wrote:

I would suggest a custom ThreadFactory. I believe the following code is much faster than ThreadLocals:

MyAppThread t = (MyAppThread) Thread.currentThread();
t.getSomething()

On Mon, Oct 15, 2018 at 6:27 AM Shevek via Concurrency-interest <[hidden email]> wrote:
We hit a serious performance degradation when we used too many
ThreadLocal instances in a process, as a consequence of which we are
avoiding using instance-ThreadLocal(s), but we are looking for an
alternative pattern for caching sub-computation state.

Would:
   static ThreadLocal<WeakHashMap<Instance, Value>>
be a good pattern?

What other observations or suggestions do people have?

S.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
On Mon, Oct 15, 2018 at 11:50 AM Gregg Wonderly via
Concurrency-interest <[hidden email]> wrote:
> Subclassing Thread invalidates the use of the security manager.

I'm pretty sure that this is false.  Even if it were somehow true, it
seems unlikely to be a significant consideration in most modern
applications.
--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
The trouble is that ThreadLocalMap isn't a HashMap, and does not have
elegant degradation on over-size. perf and related tools told me that we
were spending the majority of the application time in
ThreadLocalMap.getEntryAfterMiss() after many thousand instance
ThreadLocal(s) had been created and GC'd; this wasn't an accident of the
inliner or anything, it's real.

Short version: Long-lived threads with many short-lived ThreadLocals
sucks really hard.

At a wild guess, the table filled up with tombstones and either the
tombstones aren't GCing fast enough due to touching the WeakHashMap
(vague memory you used to be able to inhibit Reference cleanup that way)
or ...? And the consequence of all this was that we went full linear
search. I ran out of both hypotheses and time at this stage, but I still
need a practical solution for the application.

Based on a set of other great replies, here are the requirements:

* Must garbage-collect if the thread ends.
* Must garbage-collect if the instance garbage-collects.
* Must be compatible with ForkJoinPool threads, i.e. no subclasses of
Thread.

So at this point, unless I'm missing something, we're pretty much
restricted to a WeakHashMap<Thread, ?> in the instance or a
WeakHashMap<Instance, ?> in the Thread.

Then my question is, do I have a singleton PerInstanceThreadLocal so
that there is exactly one WeakHashMap per Thread, for all clients, and
just key that on instance, or do I want to have a separate WeakHashMap
per client in its own non-singleton (but still static)
PerInstanceThreadLocal?

// Do I make this class be a singleton?
// Is one big WeakHashMap better than many small ones?
// Does my code suck and did I miss a dumb trick (again)?

public class PerInstanceThreadLocal<Instance, Value>
     extends ThreadLocal<Map<Instance, Value>> {

     public class Accessor {

         private final Instance instance;
         private final Function<Instance, Value> constructor;

         public Accessor(@Nonnull Instance instance, @Nonnull
Supplier<Value> supplier) {
             this.instance = Preconditions.checkNotNull(instance,
"Instance was null.");
             this.constructor = i -> supplier.get();
         }

         public Accessor(@Nonnull Instance instance) {
             this(instance, Suppliers.ofInstance(null));
         }

         @CheckForNull
         public Value get() {
             Map<Instance, Value> map = PerInstanceThreadLocal.this.get();
             return map.computeIfAbsent(instance, constructor);
         }
     }

     @Override
     protected Map<Instance, Value> initialValue() {
         return new WeakHashMap<>();
     }
}

S.

On 10/15/2018 06:28 AM, Nathan and Ila Reynolds wrote:

> ThreadLocal works by calling Thread.currentThread() and then getting the
> Map field in Thread.  From there, it calls Map.get(ThreadLocal) to get
> the value.  This alternative pattern is going to execute
> WeakHashMap.get(Instance) and cause further memory indirections.  These
> memory indirections could hurt performance.
>
> Another pattern is ConcurrentHashMap<Thread, WeakHashMap<Instance,
> Value>>.  This is probably not an improvement over what you already
> proposed.
>
> I am curious as to the details of the "serious performance
> degradation".  Please share what you can.
>
> -Nathan
>
> On 10/14/2018 9:23 PM, Shevek via Concurrency-interest wrote:
>> We hit a serious performance degradation when we used too many
>> ThreadLocal instances in a process, as a consequence of which we are
>> avoiding using instance-ThreadLocal(s), but we are looking for an
>> alternative pattern for caching sub-computation state.
>>
>> Would:
>>   static ThreadLocal<WeakHashMap<Instance, Value>>
>> be a good pattern?
>>
>> What other observations or suggestions do people have?
>>
>> S.
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
> Must be compatible with ForkJoinPool threads, i.e. no subclasses of 
Thread.

Why no subclasses of Thread.? You may have a custom ForkJoinWorkerThreadFactory

On Mon, Oct 15, 2018 at 9:08 PM Shevek via Concurrency-interest <[hidden email]> wrote:
The trouble is that ThreadLocalMap isn't a HashMap, and does not have
elegant degradation on over-size. perf and related tools told me that we
were spending the majority of the application time in
ThreadLocalMap.getEntryAfterMiss() after many thousand instance
ThreadLocal(s) had been created and GC'd; this wasn't an accident of the
inliner or anything, it's real.

Short version: Long-lived threads with many short-lived ThreadLocals
sucks really hard.

At a wild guess, the table filled up with tombstones and either the
tombstones aren't GCing fast enough due to touching the WeakHashMap
(vague memory you used to be able to inhibit Reference cleanup that way)
or ...? And the consequence of all this was that we went full linear
search. I ran out of both hypotheses and time at this stage, but I still
need a practical solution for the application.

Based on a set of other great replies, here are the requirements:

* Must garbage-collect if the thread ends.
* Must garbage-collect if the instance garbage-collects.
* Must be compatible with ForkJoinPool threads, i.e. no subclasses of
Thread.

So at this point, unless I'm missing something, we're pretty much
restricted to a WeakHashMap<Thread, ?> in the instance or a
WeakHashMap<Instance, ?> in the Thread.

Then my question is, do I have a singleton PerInstanceThreadLocal so
that there is exactly one WeakHashMap per Thread, for all clients, and
just key that on instance, or do I want to have a separate WeakHashMap
per client in its own non-singleton (but still static)
PerInstanceThreadLocal?

// Do I make this class be a singleton?
// Is one big WeakHashMap better than many small ones?
// Does my code suck and did I miss a dumb trick (again)?

public class PerInstanceThreadLocal<Instance, Value>
     extends ThreadLocal<Map<Instance, Value>> {

     public class Accessor {

         private final Instance instance;
         private final Function<Instance, Value> constructor;

         public Accessor(@Nonnull Instance instance, @Nonnull
Supplier<Value> supplier) {
             this.instance = Preconditions.checkNotNull(instance,
"Instance was null.");
             this.constructor = i -> supplier.get();
         }

         public Accessor(@Nonnull Instance instance) {
             this(instance, Suppliers.ofInstance(null));
         }

         @CheckForNull
         public Value get() {
             Map<Instance, Value> map = PerInstanceThreadLocal.this.get();
             return map.computeIfAbsent(instance, constructor);
         }
     }

     @Override
     protected Map<Instance, Value> initialValue() {
         return new WeakHashMap<>();
     }
}

S.

On 10/15/2018 06:28 AM, Nathan and Ila Reynolds wrote:
> ThreadLocal works by calling Thread.currentThread() and then getting the
> Map field in Thread.  From there, it calls Map.get(ThreadLocal) to get
> the value.  This alternative pattern is going to execute
> WeakHashMap.get(Instance) and cause further memory indirections.  These
> memory indirections could hurt performance.
>
> Another pattern is ConcurrentHashMap<Thread, WeakHashMap<Instance,
> Value>>.  This is probably not an improvement over what you already
> proposed.
>
> I am curious as to the details of the "serious performance
> degradation".  Please share what you can.
>
> -Nathan
>
> On 10/14/2018 9:23 PM, Shevek via Concurrency-interest wrote:
>> We hit a serious performance degradation when we used too many
>> ThreadLocal instances in a process, as a consequence of which we are
>> avoiding using instance-ThreadLocal(s), but we are looking for an
>> alternative pattern for caching sub-computation state.
>>
>> Would:
>>   static ThreadLocal<WeakHashMap<Instance, Value>>
>> be a good pattern?
>>
>> What other observations or suggestions do people have?
>>
>> S.
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Okay I see that there is now a subclass permission required.  I seem to recall that check at line 394 or throwing a security exception without the permission check in the past.

Are you suggesting that security is not a need in modern apps?

Gregg

Sent from my iPhone

> On Oct 15, 2018, at 11:54 AM, David Lloyd <[hidden email]> wrote:
>
> On Mon, Oct 15, 2018 at 11:50 AM Gregg Wonderly via
> Concurrency-interest <[hidden email]> wrote:
>> Subclassing Thread invalidates the use of the security manager.
>
> I'm pretty sure that this is false.  Even if it were somehow true, it
> seems unlikely to be a significant consideration in most modern
> applications.
> --
> - DML

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
I recently hit that same issue: long living threads with many
short-lived ThreadLocals.  I ended up accessing Thread.threadLocals and
calling expungeStaleEntries() when the thread finishes executing a job. 
I haven't seen any performance and memory issues since doing this.  The
downside is that it is a hack that could break in a future version of Java.

-Nathan

On 10/15/2018 12:04 PM, Shevek wrote:

> The trouble is that ThreadLocalMap isn't a HashMap, and does not have
> elegant degradation on over-size. perf and related tools told me that
> we were spending the majority of the application time in
> ThreadLocalMap.getEntryAfterMiss() after many thousand instance
> ThreadLocal(s) had been created and GC'd; this wasn't an accident of
> the inliner or anything, it's real.
>
> Short version: Long-lived threads with many short-lived ThreadLocals
> sucks really hard.
>
> At a wild guess, the table filled up with tombstones and either the
> tombstones aren't GCing fast enough due to touching the WeakHashMap
> (vague memory you used to be able to inhibit Reference cleanup that
> way) or ...? And the consequence of all this was that we went full
> linear search. I ran out of both hypotheses and time at this stage,
> but I still need a practical solution for the application.
>
> Based on a set of other great replies, here are the requirements:
>
> * Must garbage-collect if the thread ends.
> * Must garbage-collect if the instance garbage-collects.
> * Must be compatible with ForkJoinPool threads, i.e. no subclasses of
> Thread.
>
> So at this point, unless I'm missing something, we're pretty much
> restricted to a WeakHashMap<Thread, ?> in the instance or a
> WeakHashMap<Instance, ?> in the Thread.
>
> Then my question is, do I have a singleton PerInstanceThreadLocal so
> that there is exactly one WeakHashMap per Thread, for all clients, and
> just key that on instance, or do I want to have a separate WeakHashMap
> per client in its own non-singleton (but still static)
> PerInstanceThreadLocal?
>
> // Do I make this class be a singleton?
> // Is one big WeakHashMap better than many small ones?
> // Does my code suck and did I miss a dumb trick (again)?
>
> public class PerInstanceThreadLocal<Instance, Value>
>     extends ThreadLocal<Map<Instance, Value>> {
>
>     public class Accessor {
>
>         private final Instance instance;
>         private final Function<Instance, Value> constructor;
>
>         public Accessor(@Nonnull Instance instance, @Nonnull
> Supplier<Value> supplier) {
>             this.instance = Preconditions.checkNotNull(instance,
> "Instance was null.");
>             this.constructor = i -> supplier.get();
>         }
>
>         public Accessor(@Nonnull Instance instance) {
>             this(instance, Suppliers.ofInstance(null));
>         }
>
>         @CheckForNull
>         public Value get() {
>             Map<Instance, Value> map = PerInstanceThreadLocal.this.get();
>             return map.computeIfAbsent(instance, constructor);
>         }
>     }
>
>     @Override
>     protected Map<Instance, Value> initialValue() {
>         return new WeakHashMap<>();
>     }
> }
>
> S.
>
> On 10/15/2018 06:28 AM, Nathan and Ila Reynolds wrote:
>> ThreadLocal works by calling Thread.currentThread() and then getting
>> the Map field in Thread.  From there, it calls Map.get(ThreadLocal)
>> to get the value.  This alternative pattern is going to execute
>> WeakHashMap.get(Instance) and cause further memory indirections. 
>> These memory indirections could hurt performance.
>>
>> Another pattern is ConcurrentHashMap<Thread, WeakHashMap<Instance,
>> Value>>.  This is probably not an improvement over what you already
>> proposed.
>>
>> I am curious as to the details of the "serious performance
>> degradation".  Please share what you can.
>>
>> -Nathan
>>
>> On 10/14/2018 9:23 PM, Shevek via Concurrency-interest wrote:
>>> We hit a serious performance degradation when we used too many
>>> ThreadLocal instances in a process, as a consequence of which we are
>>> avoiding using instance-ThreadLocal(s), but we are looking for an
>>> alternative pattern for caching sub-computation state.
>>>
>>> Would:
>>>   static ThreadLocal<WeakHashMap<Instance, Value>>
>>> be a good pattern?
>>>
>>> What other observations or suggestions do people have?
>>>
>>> S.
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> [hidden email]
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On Mon, Oct 15, 2018 at 2:20 PM Gregg Wonderly <[hidden email]> wrote:
> Are you suggesting that security is not a need in modern apps?

Definitely not.  I am suggesting that the current consensus (right or
wrong) appears to be that most practical applications of a security
manager do not appreciably enhance a given application's security,
hence the recent move towards deprecating it altogether.

--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Per-thread per-instance ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Hi Shevek,

I you can explicitly control the lifecycle of ThreadLocal instances, it might help to call ThreadLocal::remove when you're done with them.

Regards, Peter

On 10/15/2018 08:04 PM, Shevek via Concurrency-interest wrote:
The trouble is that ThreadLocalMap isn't a HashMap, and does not have elegant degradation on over-size. perf and related tools told me that we were spending the majority of the application time in ThreadLocalMap.getEntryAfterMiss() after many thousand instance ThreadLocal(s) had been created and GC'd; this wasn't an accident of the inliner or anything, it's real.

Short version: Long-lived threads with many short-lived ThreadLocals sucks really hard.

At a wild guess, the table filled up with tombstones and either the tombstones aren't GCing fast enough due to touching the WeakHashMap (vague memory you used to be able to inhibit Reference cleanup that way) or ...? And the consequence of all this was that we went full linear search. I ran out of both hypotheses and time at this stage, but I still need a practical solution for the application.

Based on a set of other great replies, here are the requirements:

* Must garbage-collect if the thread ends.
* Must garbage-collect if the instance garbage-collects.
* Must be compatible with ForkJoinPool threads, i.e. no subclasses of Thread.

So at this point, unless I'm missing something, we're pretty much restricted to a WeakHashMap<Thread, ?> in the instance or a WeakHashMap<Instance, ?> in the Thread.

Then my question is, do I have a singleton PerInstanceThreadLocal so that there is exactly one WeakHashMap per Thread, for all clients, and just key that on instance, or do I want to have a separate WeakHashMap per client in its own non-singleton (but still static) PerInstanceThreadLocal?

// Do I make this class be a singleton?
// Is one big WeakHashMap better than many small ones?
// Does my code suck and did I miss a dumb trick (again)?

public class PerInstanceThreadLocal<Instance, Value>
    extends ThreadLocal<Map<Instance, Value>> {

    public class Accessor {

        private final Instance instance;
        private final Function<Instance, Value> constructor;

        public Accessor(@Nonnull Instance instance, @Nonnull Supplier<Value> supplier) {
            this.instance = Preconditions.checkNotNull(instance, "Instance was null.");
            this.constructor = i -> supplier.get();
        }

        public Accessor(@Nonnull Instance instance) {
            this(instance, Suppliers.ofInstance(null));
        }

        @CheckForNull
        public Value get() {
            Map<Instance, Value> map = PerInstanceThreadLocal.this.get();
            return map.computeIfAbsent(instance, constructor);
        }
    }

    @Override
    protected Map<Instance, Value> initialValue() {
        return new WeakHashMap<>();
    }
}

S.

On 10/15/2018 06:28 AM, Nathan and Ila Reynolds wrote:
ThreadLocal works by calling Thread.currentThread() and then getting the Map field in Thread.  From there, it calls Map.get(ThreadLocal) to get the value.  This alternative pattern is going to execute WeakHashMap.get(Instance) and cause further memory indirections.  These memory indirections could hurt performance.

Another pattern is ConcurrentHashMap<Thread, WeakHashMap<Instance, Value>>.  This is probably not an improvement over what you already proposed.

I am curious as to the details of the "serious performance degradation".  Please share what you can.

-Nathan

On 10/14/2018 9:23 PM, Shevek via Concurrency-interest wrote:
We hit a serious performance degradation when we used too many ThreadLocal instances in a process, as a consequence of which we are avoiding using instance-ThreadLocal(s), but we are looking for an alternative pattern for caching sub-computation state.

Would:
  static ThreadLocal<WeakHashMap<Instance, Value>>
be a good pattern?

What other observations or suggestions do people have?

S.
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest