Overhead of ThreadLocal data

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
[+list]

On 10/17/18 11:44 AM, Nathan and Ila Reynolds wrote:
> Can we add the following method to ThreadLocal?
>
> public static void expungeStaleEntries()

This seems like a reasonable request (although perhaps with an improved
name).  The functionality exists internally, and it seems overly
parental not to export it for use as a band-aid by those people who have
tried and otherwise failed to solve the zillions of short-lived
ThreadLocals in long-lived threads problem.

Can anyone think of a reason not to do this?

-Doug

>
> This method will call ThreadLocal.ThreadLocalMap.expungeStaleEntries()
> for the ThreadLocalMap of the current thread.  Thread pools can then
> call this method when the thread finishes processing a job after GC.
> This solves the problem of zillions of short-lived ThreadLocals in
> long-lived threads. 

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list

I wasn't aware of ByteBuffer.duplicate().  That makes more sense.  However, with the way I am using ByteBuffer.slice() it is essentially the same as using duplicate().

-Nathan
On 10/17/2018 11:41 AM, Bob Lee wrote:
Can you use ByteBuffer.duplicate()?

Bob

On Wed, Oct 17, 2018 at 10:35 AM Nathan and Ila Reynolds via Concurrency-interest <[hidden email]> wrote:
 > creating zillions of short-lived ThreadLocals seems like an
antipattern to me.

Perhaps, you can share another way to solve this problem.

I have a ByteBuffer that maps a large file.  I have multiple threads
reading the ByteBuffer at different positions.  As long as the threads
don't call ByteBuffer.position(), they can operate concurrently on the
ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
method hence the thread has to call position().

Attempt #1: I started by putting a lock around the ByteBuffer. This
causes a lot of contention.

Attempt #2: I started by slicing the ByteBuffer.  This created a lot of
garbage.

Attempt #3: I put the sliced ByteBuffers into ThreadLocal but with many
files mapped, consumed and unmapped rapidly, this leads to zillions of
short-lived ThreadLocals.

Attempt #4: I put the sliced ByteBuffers into a LinkedTransferQueue but
this created a lot of garbage for creating nodes in the queue.

Attempt #5: I put the sliced ByteBuffers into a ConcurrentHashMap keyed
on the Thread.  I cannot remember why this didn't work.  I think the
overhead of ConcurrentHashMap created a lot of garbage.

Attempt #6: I went back to attempt #3 (ThreadLocal) and call expunge
when the thread returns to the thread pool.  Yes, this creates zillions
of short-lived ThreadLocals but they get cleaned out quickly so there is
performance degradation for ThreadLocal lookup.

Each thread cannot have its sliced ByteBuffer passed through the stack
as an argument.  This would create a lot of garbage from duplicate
structures.

-Nathan

On 10/17/2018 10:29 AM, Andrew Haley via Concurrency-interest wrote:
> On 10/17/2018 04:07 PM, Doug Lea via Concurrency-interest wrote:
>> This alone would not address long-standing problems with zillions of
>> short-lived ThreadLocals in long-lived threads.  Right now, ThreadLocal
>> is as fast as we know how to make it while still not completely falling
>> over under such usages. The only solution I know for this is to create a
>> new GC-aware storage class, which is not very likely to be adopted.
> Well, yeah, but creating zillions of short-lived ThreadLocals seems
> like an antipattern to me.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
I would love it if ByteBuffer had absolute versions of all the relative
methods.  I have wanted the absolute versions in the past for other
situations.

My #2 is similar what you said.  Replace duplicate with slice.

    buf.slice().position(newPos).get(byteArray);

This produces a lot of garbage unless I keep it in a ThreadLocal or some
other data structure.  I was running on Java 8.  Perhaps, Java 10 or 11
would not produce as much garbage.

-Nathan

On 10/17/2018 11:44 AM, David Lloyd wrote:

> On Wed, Oct 17, 2018 at 12:36 PM Nathan and Ila Reynolds via
> Concurrency-interest <[hidden email]> wrote:
>> Perhaps, you can share another way to solve this problem.
>>
>> I have a ByteBuffer that maps a large file.  I have multiple threads
>> reading the ByteBuffer at different positions.  As long as the threads
>> don't call ByteBuffer.position(), they can operate concurrently on the
>> ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
>> method hence the thread has to call position().
> The obvious solution would seem to be that we should enhance
> ByteBuffer to have such a method.
>
> But, your #2 should work if you are careful to do it like this:
>
>     buf.duplicate().position(newPos).get(byteArray);
>
> In such cases, HotSpot can sometimes delete the allocation of the new
> ByteBuffer altogether.  I seem to recall that my colleague Andrew
> Haley (on this thread) did some work/research in this area a while
> ago.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Are you sure that MappedByteBuffer is well suitable for concurrent random reading of large files? MappedByteBuffer.get() is potentially blocking operation. Perhaps AsynchronousFileChannel is more suitable for this case.

On Wed, Oct 17, 2018 at 8:36 PM Nathan and Ila Reynolds via Concurrency-interest <[hidden email]> wrote:
 > creating zillions of short-lived ThreadLocals seems like an
antipattern to me.

Perhaps, you can share another way to solve this problem.

I have a ByteBuffer that maps a large file.  I have multiple threads
reading the ByteBuffer at different positions.  As long as the threads
don't call ByteBuffer.position(), they can operate concurrently on the
ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
method hence the thread has to call position().

Attempt #1: I started by putting a lock around the ByteBuffer. This
causes a lot of contention.

Attempt #2: I started by slicing the ByteBuffer.  This created a lot of
garbage.

Attempt #3: I put the sliced ByteBuffers into ThreadLocal but with many
files mapped, consumed and unmapped rapidly, this leads to zillions of
short-lived ThreadLocals.

Attempt #4: I put the sliced ByteBuffers into a LinkedTransferQueue but
this created a lot of garbage for creating nodes in the queue.

Attempt #5: I put the sliced ByteBuffers into a ConcurrentHashMap keyed
on the Thread.  I cannot remember why this didn't work.  I think the
overhead of ConcurrentHashMap created a lot of garbage.

Attempt #6: I went back to attempt #3 (ThreadLocal) and call expunge
when the thread returns to the thread pool.  Yes, this creates zillions
of short-lived ThreadLocals but they get cleaned out quickly so there is
performance degradation for ThreadLocal lookup.

Each thread cannot have its sliced ByteBuffer passed through the stack
as an argument.  This would create a lot of garbage from duplicate
structures.

-Nathan

On 10/17/2018 10:29 AM, Andrew Haley via Concurrency-interest wrote:
> On 10/17/2018 04:07 PM, Doug Lea via Concurrency-interest wrote:
>> This alone would not address long-standing problems with zillions of
>> short-lived ThreadLocals in long-lived threads.  Right now, ThreadLocal
>> is as fast as we know how to make it while still not completely falling
>> over under such usages. The only solution I know for this is to create a
>> new GC-aware storage class, which is not very likely to be adopted.
> Well, yeah, but creating zillions of short-lived ThreadLocals seems
> like an antipattern to me.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/17/18 2:15 PM, Tim Peierls wrote:
>  
>
>     Doug's mention of "task local classes" sounds like he was alluding
>     to some new mode of access other than these three, so I would be
>     interested to know of such a thing.
>
>
> Not positive what Doug meant, but I don't think he was talking about a
> new mode of access when he wrote "task-local classes".

Right; as in (among other options) passing in a "TaskContext" object to
each task, so task body code would not need to rely implicitly on the
thread's ThreadLocalMap. This might in some cases be more ugly/awkward
but is still a trade-off worth considering.

-Doug

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list

The reads need the data before the rest of the operation could continue.  Hence, the thread would block on the returned Future.  It seems FileChannel would be a better fit.

Does the read() operation from either class cause a round-trip through the kernel?  If so, MappedByteBuffer.get() avoids the kernel trip if the page is in RAM and hence can perform better.

-Nathan
On 10/17/2018 12:35 PM, Andrey Pavlenko wrote:
Are you sure that MappedByteBuffer is well suitable for concurrent random reading of large files? MappedByteBuffer.get() is potentially blocking operation. Perhaps AsynchronousFileChannel is more suitable for this case.

On Wed, Oct 17, 2018 at 8:36 PM Nathan and Ila Reynolds via Concurrency-interest <[hidden email]> wrote:
 > creating zillions of short-lived ThreadLocals seems like an
antipattern to me.

Perhaps, you can share another way to solve this problem.

I have a ByteBuffer that maps a large file.  I have multiple threads
reading the ByteBuffer at different positions.  As long as the threads
don't call ByteBuffer.position(), they can operate concurrently on the
ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
method hence the thread has to call position().

Attempt #1: I started by putting a lock around the ByteBuffer. This
causes a lot of contention.

Attempt #2: I started by slicing the ByteBuffer.  This created a lot of
garbage.

Attempt #3: I put the sliced ByteBuffers into ThreadLocal but with many
files mapped, consumed and unmapped rapidly, this leads to zillions of
short-lived ThreadLocals.

Attempt #4: I put the sliced ByteBuffers into a LinkedTransferQueue but
this created a lot of garbage for creating nodes in the queue.

Attempt #5: I put the sliced ByteBuffers into a ConcurrentHashMap keyed
on the Thread.  I cannot remember why this didn't work.  I think the
overhead of ConcurrentHashMap created a lot of garbage.

Attempt #6: I went back to attempt #3 (ThreadLocal) and call expunge
when the thread returns to the thread pool.  Yes, this creates zillions
of short-lived ThreadLocals but they get cleaned out quickly so there is
performance degradation for ThreadLocal lookup.

Each thread cannot have its sliced ByteBuffer passed through the stack
as an argument.  This would create a lot of garbage from duplicate
structures.

-Nathan

On 10/17/2018 10:29 AM, Andrew Haley via Concurrency-interest wrote:
> On 10/17/2018 04:07 PM, Doug Lea via Concurrency-interest wrote:
>> This alone would not address long-standing problems with zillions of
>> short-lived ThreadLocals in long-lived threads.  Right now, ThreadLocal
>> is as fast as we know how to make it while still not completely falling
>> over under such usages. The only solution I know for this is to create a
>> new GC-aware storage class, which is not very likely to be adopted.
> Well, yeah, but creating zillions of short-lived ThreadLocals seems
> like an antipattern to me.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
Or with a worse name, to ward off use by people who think that calling
`System.gc()` "helps" the GC...

On 10/17/2018 2:26 PM, Doug Lea via Concurrency-interest wrote:

> [+list]
>
> On 10/17/18 11:44 AM, Nathan and Ila Reynolds wrote:
>> Can we add the following method to ThreadLocal?
>>
>> public static void expungeStaleEntries()
>
> This seems like a reasonable request (although perhaps with an improved
> name).  The functionality exists internally, and it seems overly
> parental not to export it for use as a band-aid by those people who have
> tried and otherwise failed to solve the zillions of short-lived
> ThreadLocals in long-lived threads problem.
>
> Can anyone think of a reason not to do this?
>
> -Doug
>
>>
>> This method will call ThreadLocal.ThreadLocalMap.expungeStaleEntries()
>> for the ThreadLocalMap of the current thread.  Thread pools can then
>> call this method when the thread finishes processing a job after GC.
>> This solves the problem of zillions of short-lived ThreadLocals in
>> long-lived threads.
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
@nathan

Re Absolute BytrBuffer ops AFAIK there is somethinf moving on http://mail.openjdk.java.net/pipermail/nio-dev/2018-October/005511.html

Il mer 17 ott 2018, 21:26 Nathan and Ila Reynolds via Concurrency-interest <[hidden email]> ha scritto:
I would love it if ByteBuffer had absolute versions of all the relative
methods.  I have wanted the absolute versions in the past for other
situations.

My #2 is similar what you said.  Replace duplicate with slice.

    buf.slice().position(newPos).get(byteArray);

This produces a lot of garbage unless I keep it in a ThreadLocal or some
other data structure.  I was running on Java 8.  Perhaps, Java 10 or 11
would not produce as much garbage.

-Nathan

On 10/17/2018 11:44 AM, David Lloyd wrote:
> On Wed, Oct 17, 2018 at 12:36 PM Nathan and Ila Reynolds via
> Concurrency-interest <[hidden email]> wrote:
>> Perhaps, you can share another way to solve this problem.
>>
>> I have a ByteBuffer that maps a large file.  I have multiple threads
>> reading the ByteBuffer at different positions.  As long as the threads
>> don't call ByteBuffer.position(), they can operate concurrently on the
>> ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
>> method hence the thread has to call position().
> The obvious solution would seem to be that we should enhance
> ByteBuffer to have such a method.
>
> But, your #2 should work if you are careful to do it like this:
>
>     buf.duplicate().position(newPos).get(byteArray);
>
> In such cases, HotSpot can sometimes delete the allocation of the new
> ByteBuffer altogether.  I seem to recall that my colleague Andrew
> Haley (on this thread) did some work/research in this area a while
> ago.
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On Linux at least, nearly *any* memory access is a potentially
blocking operation.  It's really hard to say which is better than the
other without profiling in as realistic of a deployment situation as
possible.
On Wed, Oct 17, 2018 at 3:02 PM Nathan and Ila Reynolds via
Concurrency-interest <[hidden email]> wrote:

>
> The reads need the data before the rest of the operation could continue.  Hence, the thread would block on the returned Future.  It seems FileChannel would be a better fit.
>
> Does the read() operation from either class cause a round-trip through the kernel?  If so, MappedByteBuffer.get() avoids the kernel trip if the page is in RAM and hence can perform better.
>
> -Nathan
>
> On 10/17/2018 12:35 PM, Andrey Pavlenko wrote:
>
> Are you sure that MappedByteBuffer is well suitable for concurrent random reading of large files? MappedByteBuffer.get() is potentially blocking operation. Perhaps AsynchronousFileChannel is more suitable for this case.
>
> On Wed, Oct 17, 2018 at 8:36 PM Nathan and Ila Reynolds via Concurrency-interest <[hidden email]> wrote:
>>
>>  > creating zillions of short-lived ThreadLocals seems like an
>> antipattern to me.
>>
>> Perhaps, you can share another way to solve this problem.
>>
>> I have a ByteBuffer that maps a large file.  I have multiple threads
>> reading the ByteBuffer at different positions.  As long as the threads
>> don't call ByteBuffer.position(), they can operate concurrently on the
>> ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
>> method hence the thread has to call position().
>>
>> Attempt #1: I started by putting a lock around the ByteBuffer. This
>> causes a lot of contention.
>>
>> Attempt #2: I started by slicing the ByteBuffer.  This created a lot of
>> garbage.
>>
>> Attempt #3: I put the sliced ByteBuffers into ThreadLocal but with many
>> files mapped, consumed and unmapped rapidly, this leads to zillions of
>> short-lived ThreadLocals.
>>
>> Attempt #4: I put the sliced ByteBuffers into a LinkedTransferQueue but
>> this created a lot of garbage for creating nodes in the queue.
>>
>> Attempt #5: I put the sliced ByteBuffers into a ConcurrentHashMap keyed
>> on the Thread.  I cannot remember why this didn't work.  I think the
>> overhead of ConcurrentHashMap created a lot of garbage.
>>
>> Attempt #6: I went back to attempt #3 (ThreadLocal) and call expunge
>> when the thread returns to the thread pool.  Yes, this creates zillions
>> of short-lived ThreadLocals but they get cleaned out quickly so there is
>> performance degradation for ThreadLocal lookup.
>>
>> Each thread cannot have its sliced ByteBuffer passed through the stack
>> as an argument.  This would create a lot of garbage from duplicate
>> structures.
>>
>> -Nathan
>>
>> On 10/17/2018 10:29 AM, Andrew Haley via Concurrency-interest wrote:
>> > On 10/17/2018 04:07 PM, Doug Lea via Concurrency-interest wrote:
>> >> This alone would not address long-standing problems with zillions of
>> >> short-lived ThreadLocals in long-lived threads.  Right now, ThreadLocal
>> >> is as fast as we know how to make it while still not completely falling
>> >> over under such usages. The only solution I know for this is to create a
>> >> new GC-aware storage class, which is not very likely to be adopted.
>> > Well, yeah, but creating zillions of short-lived ThreadLocals seems
>> > like an antipattern to me.
>> >
>> _______________________________________________
>> Concurrency-interest mailing list
>> [hidden email]
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest



--
- DML
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 17/10/2018 18:06, Nathan and Ila Reynolds via Concurrency-interest wrote:

> > creating zillions of short-lived ThreadLocals seems like an
> antipattern to me.
>
> Perhaps, you can share another way to solve this problem.
>
> I have a ByteBuffer that maps a large file.  I have multiple threads
> reading the ByteBuffer at different positions.  As long as the threads
> don't call ByteBuffer.position(), they can operate concurrently on the
> ByteBuffer.  However, ByteBuffer.get(byte[]) does not have an absolute
> method
Efforts to add absolute variants of the bulk get/put operations is under
discussion on nio-dev. The intention isn't to make bufffers thread safe
but it should help with many cases where buffer position is a hindrance.

-Alan
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
ArrayList.trimToSize() is benign and almost entirely forgotten. Might find something similar.

On Wed, Oct 17, 2018 at 1:40 PM Brian Goetz via Concurrency-interest <[hidden email]> wrote:
Or with a worse name, to ward off use by people who think that calling
`System.gc()` "helps" the GC...

On 10/17/2018 2:26 PM, Doug Lea via Concurrency-interest wrote:
> [+list]
>
> On 10/17/18 11:44 AM, Nathan and Ila Reynolds wrote:
>> Can we add the following method to ThreadLocal?
>>
>> public static void expungeStaleEntries()
>
> This seems like a reasonable request (although perhaps with an improved
> name).  The functionality exists internally, and it seems overly
> parental not to export it for use as a band-aid by those people who have
> tried and otherwise failed to solve the zillions of short-lived
> ThreadLocals in long-lived threads problem.
>
> Can anyone think of a reason not to do this?
>
> -Doug
>
>>
>> This method will call ThreadLocal.ThreadLocalMap.expungeStaleEntries()
>> for the ThreadLocalMap of the current thread.  Thread pools can then
>> call this method when the thread finishes processing a job after GC.
>> This solves the problem of zillions of short-lived ThreadLocals in
>> long-lived threads.
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
The other possible API is an abstraction over a stack that grows at the same time the stack grows

  class Context<V> {
    public Context(Supplier<? extends V> initialValueSupplier);
    public V getValue();
    public void setValue(V value);
    public void enter(Runnable runnable);
  }

The idea is that if you want to access (read/write) to the value, you have to do it in the Runnable after calling enter().
By example,

  var context = new Context<>(() -> 1);
  var context2 = new Context<>(() -> 2);
   
  context.enter(() -> {
    context2.enter(() -> {
      assertEquals(1, (int)context.getValue());
      assertEquals(2, (int)context2.getValue());
    });
  });

The way to implement this API is to have a growable array of pairs Context/Value in java.lang.Thread, finding/replacing the value is a loop from the end of the array to the beginning that returns the value associated to the context.

This API has the advantage to be explicit about the cost (you see the Context.enter in the stacktrace), usages of ThreadLocal is to be easy to hide IMO, you have even a lot of codes that initialize a ThreadLocal in a static block even if you never calls the method that using it (this may be fixed when lazy static final field will be implemented [1]).

Rémi

[1] https://bugs.openjdk.java.net/browse/JDK-8209964

----- Mail original -----
> De: "concurrency-interest" <[hidden email]>
> À: [hidden email]
> Cc: "Doug Lea" <[hidden email]>, "concurrency-interest" <[hidden email]>
> Envoyé: Mercredi 17 Octobre 2018 18:10:43
> Objet: Re: [concurrency-interest] Overhead of ThreadLocal data

> On Wed, Oct 17, 2018 at 11:01 AM Tim Peierls via Concurrency-interest
> <[hidden email]> wrote:
>> On Wed, Oct 17, 2018 at 11:28 AM Doug Lea via Concurrency-interest
>> <[hidden email]> wrote:
>>> Also consider restructuring code to use task-local classes (perhaps with
>>> linkages among them) that are GCable when tasks complete.
>>
>> I bet a lot of the ThreadLocal uses under consideration would benefit from this
>> kind of restructuring, avoiding ThreadLocal entirely.
>
> How would one access task-local data if not by way of a ThreadLocal?
> --
> - DML
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/17/2018 07:35 PM, Andrey Pavlenko via Concurrency-interest wrote:
> Are you sure that MappedByteBuffer is well suitable for concurrent random
> reading of large files?

Yes.

> MappedByteBuffer.get() is potentially blocking
> operation.

So is any access to memory if you're running out of space.

> Perhaps AsynchronousFileChannel is more suitable for this case.

Why? You can prefetch the file if you need to have it ready.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/17/2018 06:06 PM, Nathan and Ila Reynolds via Concurrency-interest wrote:
> Each thread cannot have its sliced ByteBuffer passed through the stack
> as an argument.  This would create a lot of garbage from duplicate
> structures.

Why would it create a lot of garbage when passed through the stack?

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/17/2018 05:50 PM, Joshua Bloch wrote:
> For your amusement, when I first designed and implemented
> ThreadLocal, I assumed that no VM would ever have more that ~10
> thread locals over its lifetime. We reimplemented it several times
> as my initial estimate proved further and further from the truth.

:-)

> The API has held up pretty well, though.

Indeed it has. The big problem is that unless you have a very deep
knowledge of the VM, you're not going to realize that
Thread.currentThread() costs nothing, but ThreadLocal.get() is
expensive.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
In case of a non-large file and/or sequential reads MappedByteBuffer is definitely faster, but the question here is about concurrent random reads of large files. For this purpose MappedByteBuffer may be inefficient.

On Thu, Oct 18, 2018 at 11:21 AM Andrew Haley <[hidden email]> wrote:
On 10/17/2018 07:35 PM, Andrey Pavlenko via Concurrency-interest wrote:
> Are you sure that MappedByteBuffer is well suitable for concurrent random
> reading of large files?

Yes.

> MappedByteBuffer.get() is potentially blocking
> operation.

So is any access to memory if you're running out of space.

> Perhaps AsynchronousFileChannel is more suitable for this case.

Why? You can prefetch the file if you need to have it ready.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
On 10/18/2018 10:29 AM, Andrey Pavlenko wrote:
> In case of a non-large file and/or sequential reads MappedByteBuffer is
> definitely faster, but the question here is about *concurrent* *random*
> reads of *large* files. For this purpose MappedByteBuffer may be
> inefficient.

Perhaps. It might be that manually managing memory (by reading and
writing the parts of the file you need) works better than letting the
kernel do it, but the kernel will cache as much as it can anyway, so
it's not as if it'll necessarily save memory or reduce disk activity.
There are advantages to using read() for sequential file access because
the kernel can automatically read and cache the next part of the file.

There were some problems with inefficient code generation for byte
buffers but we have worked on that and it's better now, with (even)
more improvements to come. Unless there are kernel issues I don't know
about, mapped files are excellent for random access, and the Java
ByteBuffer operations generate excellent code. (This isn't guaranteed
because C2 uses a bunch of heuristics, but it's usually good.)

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
On 10/18/2018 10:58 AM, Andrew Haley wrote:
> Perhaps. It might be that manually managing memory (by reading and
> writing the parts of the file you need) works better than letting the
> kernel do it, but the kernel will cache as much as it can anyway, so
> it's not as if it'll necessarily save memory or reduce disk activity.

Thinking about this some more, it would be nice to have an interface to
madvise(2), in particular MADV_DONTNEED.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
On 10/18/2018 10:58 AM, Andrew Haley via Concurrency-interest wrote:
> On 10/18/2018 10:29 AM, Andrey Pavlenko wrote:
>> In case of a non-large file and/or sequential reads MappedByteBuffer is
>> definitely faster, but the question here is about *concurrent* *random*
>> reads of *large* files. For this purpose MappedByteBuffer may be
>> inefficient.
>
> Perhaps.

Mea culpa. I went out for a walk and thought about this, and one
important problem with using MappedByteBuffers for random access
dawned on me: you're looking at extended time-to-safepoint
pauses. This would happen on any system with slow storage or a heavily
overloaded I/O subsystem which blocked reading a page from memory. It
wouldn't happen with a File interface using simply read() calls
because threads blocked in read() don't delay safepoints.

So yes, I take your point.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: Overhead of ThreadLocal data

JSR166 Concurrency mailing list
In reply to this post by JSR166 Concurrency mailing list
When accessing a file through the kernel's file I/O, the thread context
switches from user land into kernel land.  The thread then goes to the
file cache to see if the data is there.  If not, the kernel blocks the
thread and pulls the data from disk.  Once the data is in the file
cache, the thread copies the data into the user land buffer and context
switches back to user land.

When accessing a file through memory mapped I/O, the thread does a load
instruction against RAM.  If the data is not in RAM, then the thread
switches to the kernel, blocks while pulling data from disk and resumes
operation.

File I/O and memory mapped I/O do the same operations but in a different
order.  The difference is key.  With file I/O, the thread has to context
switch into the kernel with every access. Thus, we use large buffers to
minimize the performance impact of the kernel round trip.  It is the
context switch with every operation that hurts file I/O and where memory
mapped I/O shines. So, memory mapped I/O does well at concurrent random
reads of large files, but comes with an initialization cost and isn't
the best solution for all file access.  I have found that file I/O
considerably outperforms memory mapped I/O when sequentially reading and
writing to a file unless you can map the entire file in one large piece.

-Nathan

On 10/18/2018 3:58 AM, Andrew Haley wrote:

> On 10/18/2018 10:29 AM, Andrey Pavlenko wrote:
>> In case of a non-large file and/or sequential reads MappedByteBuffer is
>> definitely faster, but the question here is about *concurrent* *random*
>> reads of *large* files. For this purpose MappedByteBuffer may be
>> inefficient.
> Perhaps. It might be that manually managing memory (by reading and
> writing the parts of the file you need) works better than letting the
> kernel do it, but the kernel will cache as much as it can anyway, so
> it's not as if it'll necessarily save memory or reduce disk activity.
> There are advantages to using read() for sequential file access because
> the kernel can automatically read and cache the next part of the file.
>
> There were some problems with inefficient code generation for byte
> buffers but we have worked on that and it's better now, with (even)
> more improvements to come. Unless there are kernel issues I don't know
> about, mapped files are excellent for random access, and the Java
> ByteBuffer operations generate excellent code. (This isn't guaranteed
> because C2 uses a bunch of heuristics, but it's usually good.)
>
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
1234