A new Lock implementation: FileLock

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

A new Lock implementation: FileLock

Joshua Bloch
Folks,

Hi. I wrote a new java.util.concurrent.locks.Lock implementation
called FileLock.  Briefly, a FileLock instance is backed by a
java.nio.channels.FileLock as well as a
java.util.concurrent.locks.ReentrantLock.  It implements functionality
similarlar to ReentrantLock, but without the bells and whistles, and
it works across VMs as well as within a VM.

But there are problems:

(1) Perhaps the fundamental problem: if you're doing locking across
VMs, it's probably because you're protecting some shared persistent
resource.  If a VM holding a FileLock dies, the lock is automatically
dropped.  If the shared persistent resource is in an inconsistent
state, woe betide thee.

(2) I discovered that there is a bug in java.nio.channels.FileLock.
As a result, you get horrible, machine-dependent behavior if you try
to create multiple instances of my FileLock class in a single VM
backed by the same file.  On windows, your process hangs.  On Unix,
you don't get mutual exclusion among threads.  If
java.nio.channels.FileLock obeyed its spec, you'd get a nice exception
(OverlappingFileLockException).  I reported this bug today, but I
don't expect it to get fixed any time soon.

(3) If I were to implement condition variables in conjunction with
FileLock, Condition.signal would not work across VMs.  This would
violate the principle of least astonishment.  So, for now, I haven't
implemented condition variables (though it would be easy to do).

(4) People would use FileLocks across physical machines, using NFS
files.  This might work reliably.  Or it might not.  If it didn't,
they'd get angry at me.

Anyway, if you are so inclined, take a look at it and tell me what you
think.  When I first came up with the idea, I thought it might make a
nice addition to j.u.c, but now I'm not so sure.

            Josh

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest

FileLock.java (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2


Joshua Bloch wrote:
> Hi. I wrote a new java.util.concurrent.locks.Lock implementation
> called FileLock.  Briefly, a FileLock instance is backed by a
> java.nio.channels.FileLock as well as a
> java.util.concurrent.locks.ReentrantLock.  It implements functionality
> similarlar to ReentrantLock, but without the bells and whistles, and
> it works across VMs as well as within a VM.
>
> But there are problems:

> (4) People would use FileLocks across physical machines, using NFS
> files.  This might work reliably.  Or it might not.  If it didn't,
> they'd get angry at me.
>
> Anyway, if you are so inclined, take a look at it and tell me what you
> think.  When I first came up with the idea, I thought it might make a
> nice addition to j.u.c, but now I'm not so sure.

For remote, distributed capabilities, the Jini platform's transaction manager
really provides a crash resilent implementation.  The configurabilty and
flexibility at deployment provides a great way to make things as effecient or as
secure as you need.

I know Jini is not in the J2SE, but at some point, I hope people who need
distributed solutions will come to realize what power and capabilities
are in Jini.

There's really not an interesting reason to recreate all of that work.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Joe Bowbeer
On 8/27/05, Gregg Wonderly <[hidden email]> wrote:
>
> For remote, distributed capabilities, the Jini platform's transaction manager
> really provides a crash resilent implementation.  The configurabilty and
> flexibility at deployment provides a great way to make things as effecient or as
> secure as you need.
>

See also JavaSpaces, built on Jini's distributed transaction support:

Explore Jini Transactions with JavaSpaces
http://www.artima.com/jini/jiniology/js4.html

Getting Started With JavaSpaces Technology
http://java.sun.com/developer/technicalArticles/tools/JavaSpaces/

"The JavaSpaces API uses the package net.jini.core.transaction to
provide basic atomic transactions that group multiple operations
across multiple JavaSpaces services into a bundle that acts as a
single atomic operation."

Joe.

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Dawid Kurzyniec
In reply to this post by Gregg Wonderly-2
Gregg Wonderly wrote:

>
>> (4) People would use FileLocks across physical machines, using NFS
>> files.  This might work reliably.  Or it might not.  If it didn't,
>> they'd get angry at me.
>>
>> Anyway, if you are so inclined, take a look at it and tell me what you
>> think.  When I first came up with the idea, I thought it might make a
>> nice addition to j.u.c, but now I'm not so sure.
>
>
> For remote, distributed capabilities, the Jini platform's transaction
> manager
> really provides a crash resilent implementation.  The configurabilty and
> flexibility at deployment provides a great way to make things as
> effecient or as
> secure as you need.
>
> I know Jini is not in the J2SE, but at some point, I hope people who need
> distributed solutions will come to realize what power and capabilities
> are in Jini.
>
> There's really not an interesting reason to recreate all of that work.
>
Gregg,

I hope that your point is that distributed file locking is not a
reliable substitute for a transaction manager, be it Jini or otherwise,
when one is needed. If so, I fully agree. On the other hand, distributed
transaction manager is shooting flies from a cannon if all you need is
basic non-distributed inter-process synchronization. (Carrying that
cannon around, no matter how well-featured it is, will cost you in terms
of deployment size and complexity, and run-time overheads of two-phase
commit etc.) So, all in all I am happy to see this file lock
implementation, even if it is prone to abuse (what isn't?). However, I
am not very sure if it belongs to j.u.c., as the latter is very "core"
and compact, and single-JVM-oriented. My feeling is that a file lock
would stick out. But, for instance, I think it would make a good
candidate for an online article.

Regards,
Dawid


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2


Dawid Kurzyniec wrote:

> Gregg Wonderly wrote:
>> For remote, distributed capabilities, the Jini platform's transaction
>> manager
>> really provides a crash resilent implementation.  The configurabilty and
>> flexibility at deployment provides a great way to make things as
>> effecient or as
>> secure as you need.
>>
>> I know Jini is not in the J2SE, but at some point, I hope people who need
>> distributed solutions will come to realize what power and capabilities
>> are in Jini.
>>
>> There's really not an interesting reason to recreate all of that work.
>>
> Gregg,
>
> I hope that your point is that distributed file locking is not a
> reliable substitute for a transaction manager, be it Jini or otherwise,
> when one is needed. If so, I fully agree. On the other hand, distributed
> transaction manager is shooting flies from a cannon if all you need is
> basic non-distributed inter-process synchronization.

My point is that a solution already exists.  If you think that you need a same machine, interprocess solution today,
chances are that tomorrow you'll need a distributed version.  It may not happen that way, but the failure scenarios and
all of the situations that develop on an interprocess solution are exactly what happen in a distributed system.

Look at the 8 fallacies of distributed computing and replace network with filesystem, or process and you'll find that
the parallels are very telling.

It is short sighted views of expansion and potential impact that are the foundation of the events that are reflected in
the 8 fallacies of distributed computing.

It might feel like you're carrying around a cannon.  But when you might someday have to sink the whole ship, it's always
nice to have the correct tools already in hand.  Investing in the right technology base has always rewarded me with many
more opportunities to exploit what I have done.  In the case of Jini, I can distributed my applications on one or more
machines as needed.  The operations between processes work no matter what environment I deploy in.

There will always be time/space tradeoffs to be considered, but I think this problem has already be solved and there
really is not a new solution needed.

There might instead need to be some exploration done in how to optimize marshalling/unmarshalling, or other parts of the
JERI/RMI stack.  This might make the time/space tradeoffs work out to allow some applications to use less hardware.

But, the cost of hardware at a fixed price, compared to the unbounded cost of fixing or dealing with marginal/broken
software is a pretty compelling argument for using something that is already proven and well defined in operation,
complexity and scalability (you can measure the performance and calculate the hardware/network needs).

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Brian Goetz
> My point is that a solution already exists.  If you think that you need
> a same machine, interprocess solution today, chances are that tomorrow
> you'll need a distributed version.  It may not happen that way, but the
> failure scenarios and all of the situations that develop on an
> interprocess solution are exactly what happen in a distributed system.

This is a pretty compelling argument.  Take caching; there are lots of
good off-the-shelf caching products out there, but everyone rolls their
own, because "they don't need something that big, they just need a Map
with expiration."  Over time, they end up reinventing a pretty
complicated wheel.


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Dawid Kurzyniec
In reply to this post by Gregg Wonderly-2
Gregg Wonderly wrote:

>>> There's really not an interesting reason to recreate all of that work.
>>>
>> I hope that your point is that distributed file locking is not a
>> reliable substitute for a transaction manager, be it Jini or
>> otherwise, when one is needed. If so, I fully agree. On the other
>> hand, distributed transaction manager is shooting flies from a cannon
>> if all you need is basic non-distributed inter-process synchronization.
>
>
>
> My point is that a solution already exists.

Only, to a different problem.

> If you think that you
> need a same machine, interprocess solution today, chances are that
> tomorrow you'll need a distributed version.  (...)
>
"If you only have a hammer, everything looks like a nail." Or, a
federation of distributed services, for that matter.

Scenarios where I would use file locking are those where I would
synchronize access to shared files stored in the very same filesystem.
For instance, suppose I am developing an e-mail client that caches
messages in ${user.home}/Mail. Then I need some means of protecting data
from mangling when the user launches two clients simultaneously. The
same goes if I am developing an e-mail server, since the user may open
multiple sessions simultaneously, which can e.g. move/delete files on
the server. Or, suppose I want to make sure that only a single instance
of an executable can be running on a host (e.g. if it is a system-level
service). These are well-known (for decades) and legit use cases for
file locking, in which I would be out of my mind to deploy a distributed
transaction manager.

The bottom line is that EVERY technology, be it a hammer, Jini, or file
locking, has its domain of applicability. Claiming that one technology
can solve all world's problems is naive. At least, stick to the issue at
hand, and confine yourself to precise, technical, non-vague,
non-hypothetical, non-tutoring, non-red-herring and non-philosophical
arguments.

Regards,
Dawid




_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Joshua Bloch
In reply to this post by Brian Goetz
I don't find this at all convincing.  When I was implementing
java.util.prefs, an interprocess locking mechanism was precisely what
I needed.  I did not need any of the other trappings of a full-blown
transaction system, and yes, I know what they are: I designed and
built distributed transaction systems for over a decade.

As for distributed vs. interprocess, the file-locking semantics of NFS
are supposed to be well-defined at to work properly.  I'm not sure if
this is really the case.  I would assume there are some good and some
bad implementations out there.

Jini is a large, complex AP consisting of 60+ packages!
java.util.concurrency.locks.Lock has 6 methods.  If I can save someone
from having to learn the former by introducing a special-purpose
implementation of the latter, I've done a good deed.  I don't see it
as reimplementing the wheel.

        Josh

On 8/29/05, Brian Goetz <[hidden email]> wrote:

> > My point is that a solution already exists.  If you think that you need
> > a same machine, interprocess solution today, chances are that tomorrow
> > you'll need a distributed version.  It may not happen that way, but the
> > failure scenarios and all of the situations that develop on an
> > interprocess solution are exactly what happen in a distributed system.
>
> This is a pretty compelling argument.  Take caching; there are lots of
> good off-the-shelf caching products out there, but everyone rolls their
> own, because "they don't need something that big, they just need a Map
> with expiration."  Over time, they end up reinventing a pretty
> complicated wheel.
>
>
>

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Ian Griffiths
I tend to agree with Josh.

I've used Jini pretty widely for distributed, highly secure modules such
 as storing session parameters and system parameters that have to
survive a crash. It has many qualities such as robustness and
"distributability". One quality it does lack glaringly is speed  (which
is understandable as it's busy doing other things).

I've been working a lot recently with the problem of sharing data
between processes on the same machine. I have to run a number of
identical processes on the servers for a very stupid reason: Windows
won't allow a session greater than 1.6Gb and the machine has 8Gb of
memory. I would be quite happy to run one 6Gb process. But Hotspot isn't
and won't!

So I'm stuck with the problem of sharing some files between applications.

A simple to use file lock would probably do the trick just nicely.

I would agree with one of the previous posters that j.u.c. is not the
place to put it. Probably in an io or communications package.

Ian

-----Original Message-----
From: Joshua Bloch <[hidden email]>
To: Brian Goetz <[hidden email]>
Cc: [hidden email], Dawid Kurzyniec
<[hidden email]>, [hidden email]
Date: Mon, 29 Aug 2005 21:39:11 -0700
Subject: Re: [concurrency-interest] A new Lock implementation: FileLock

> I don't find this at all convincing.  When I was implementing
> java.util.prefs, an interprocess locking mechanism was precisely what
> I needed.  I did not need any of the other trappings of a full-blown
> transaction system, and yes, I know what they are: I designed and
> built distributed transaction systems for over a decade.
>
> As for distributed vs. interprocess, the file-locking semantics of
> NFS
> are supposed to be well-defined at to work properly.  I'm not sure if
> this is really the case.  I would assume there are some good and some
> bad implementations out there.
>
> Jini is a large, complex AP consisting of 60+ packages!
> java.util.concurrency.locks.Lock has 6 methods.  If I can save
> someone
> from having to learn the former by introducing a special-purpose
> implementation of the latter, I've done a good deed.  I don't see it
> as reimplementing the wheel.
>
>         Josh
>
> On 8/29/05, Brian Goetz <[hidden email]> wrote:
> > > My point is that a solution already exists.  If you think that
> you need
> > > a same machine, interprocess solution today, chances are that
> tomorrow
> > > you'll need a distributed version.  It may not happen that way,
> but the
> > > failure scenarios and all of the situations that develop on an
> > > interprocess solution are exactly what happen in a distributed
> system.
> >
> > This is a pretty compelling argument.  Take caching; there are lots
> of
> > good off-the-shelf caching products out there, but everyone rolls
> their
> > own, because "they don't need something that big, they just need a
> Map
> > with expiration."  Over time, they end up reinventing a pretty
> > complicated wheel.
> >
> >
> >
>
> _______________________________________________
> Concurrency-interest mailing list
> [hidden email]
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2
In reply to this post by Joshua Bloch


Joshua Bloch wrote:
> Jini is a large, complex AP consisting of 60+ packages!

There are many parts of Jini which you don't need for such things.

> java.util.concurrency.locks.Lock has 6 methods.

I understand this thought, but it in fact uses the whole JVM implementation which is many more lines of code than Jini.
  You're willing to accept the requirement to use Java for your application.  So I'm not sure how valid of an argument
size is.  Jini is a toolkit, just like the Java platform is.  It's targeted at providing solutions to some specific
types of problems.  Those problems are not trivial in nature, but can appear minimal at first glance.

 > If I can save someone
> from having to learn the former by introducing a special-purpose
> implementation of the latter, I've done a good deed.  I don't see it
> as reimplementing the wheel.

You will recreate all the logic and trappings associated with distributed failures.  When an application releases the
lock, does that mean that the now unlocked data is valid, or invalid?  How will you know?  A distributed transaction
system would let you arrive at a safe point.

One example might be that you ran out of space on the file system.  Half of the data made it to disk, the other half is
still pending, or lost.  But, the file system is actively aquiring new space, and you'd really like to let the failed
writer finish its job, before letting other writers put their data in there.  However, at some point, you want progress
to be made.  If the releasing process actually exited when the write failed because it has a bug, you could use a
distributed lease to finally expire the access lock unconditionally and move on with some type of cleanup.

Transactions and leasing allow things to progress in a way that all participating parties can agree with.  If you just
implement the file lock, then you'll need some kind of IPC eventually to let the processes discuss how to handle
different issues, I'd bet.  Its the development of the initial need verses the longterm needs of the application that
can make the problem seem a lot smaller than it actually is.

The speed of distributed transactions can be many orders of magnitude slower than a single machine kernel based file
lock, or an NFS file lock.  There are many exchanges between multiple processes.  It's an area where a lot of research
and experimentation could be performed.  It might be that an NFS file lock based transaction implementation in Jini
would be a great optimization.  I don't know.  However, I don't use NFS on any filesystems that my applications run on
though, so that solution is not attractive from that point of view.

I have no doubt that many people on this list know precisely all of the issues that need to be dealt with.  I just
detect a one small step at a time thought process that will, sure enough, allow you to get done what you want done.
But, eventually, you'll recreate all the abilities that already exist in Jini, or are enabled my the tools in Jini, and
that seems like a non-benefit to the Java platform.


Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2
In reply to this post by Dawid Kurzyniec


Dawid Kurzyniec wrote:
>> My point is that a solution already exists.
>
> Only, to a different problem.

I appreciate this feeling.  But, I am not sure that it's well founded, yet.

> Scenarios where I would use file locking are those where I would
> synchronize access to shared files stored in the very same filesystem.
> For instance, suppose I am developing an e-mail client that caches
> messages in ${user.home}/Mail. Then I need some means of protecting data
> from mangling when the user launches two clients simultaneously. The
> same goes if I am developing an e-mail server, since the user may open
> multiple sessions simultaneously, which can e.g. move/delete files on
> the server. Or, suppose I want to make sure that only a single instance
> of an executable can be running on a host (e.g. if it is a system-level
> service). These are well-known (for decades) and legit use cases for
> file locking, in which I would be out of my mind to deploy a distributed
> transaction manager.

In many of these cases I would use an accessor service, today, instead of a distributed locking mechanism.  This would
allow a single place to be in control of access, and multithreading issues could be controlled exactly in the place
where that control was needed.  Sure we used to use locks, because everything was on the same machine there were little
library functions that were part of various packages that you needed to use to do locking with those applications.  The
problem is that all of them did something differently, and this made interworking more difficult.  A standard locking
service allows you to tie the operations of multiple applications together for a new application.

Advisory file locking is straight forward to use.  When we used link(2) to create advisory locks in unix, we got into
problems with stuck locks when processes crashed.  We would next add signal handlers to try and catch the process death
and remove all locks.  Then, the errant users found out about SIGKILL, and found that they could wreak havoc on things.
  So we added flock() so that the lock was really implemented in the kernel.  That is a locking service implementation!
  Now, everyone goes through the same implementation, doing the same thing.  Josh is talking about taking one such
implementation of a file locking service that is distributed via NFS.  I'm suggesting that it would be even better to
take the next step and just use something that already exists, which provides a lot of additional features that aren't
afterthought addons.

> The bottom line is that EVERY technology, be it a hammer, Jini, or file
> locking, has its domain of applicability. Claiming that one technology
> can solve all world's problems is naive. At least, stick to the issue at
> hand, and confine yourself to precise, technical, non-vague,
> non-hypothetical, non-tutoring, non-red-herring and non-philosophical
> arguments.

Dawid, I'm sorry that you feel like my argument was overbearing or incorrectly placed.  My personal experience with
software development over the past 20+ years has taught me that cutting corners and creating a minimal implementation
for my own use, just to get started "cheaper" is not really the best thing.

The technical argument is that the Jini transactional services do everything you need for a working distributed locking
service.  Do they do it optimally, or exactly the way you might implement it with ignorance to prior arts, in a new
application?  Probably not.  But, the foundation is there to improve on.  The JERI stack is plugable for providing more
optimal transports of data while maintaining a Java platform API approach to programming.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Joshua Bloch
In reply to this post by Gregg Wonderly-2
Gregg,

> Joshua Bloch wrote:
> > Jini is a large, complex AP consisting of 60+ packages!
>
> There are many parts of Jini which you don't need for such things.
>
> > java.util.concurrency.locks.Lock has 6 methods.
>
> I understand this thought, but it in fact uses the whole JVM implementation which is many more lines of code than Jini.

That is utterly irrelevant.  Who cares how many lines of code there
are in the JVM *implementation*?  We are talking about the conceptual
weight of an interface.  The target audience for this class already
knows the java.util.concurrent.locks.Lock *interface*, so the marginal
conceptual weight of this implementation is pretty close to zero.  The
audience does not know Jini, and the marginal conceptual weight to
learn it is large.

> One example might be that you ran out of space on the file system.  Half of the data made it to disk, the other half is
> still pending, or lost.  But, the file system is actively aquiring new space, and you'd really like to let the failed
> writer finish its job, before letting other writers put their data in there.  However, at some point, you want progress
> to be made.  If the releasing process actually exited when the write failed because it has a bug, you could use a
> distributed lease to finally expire the access lock unconditionally and move on with some type of cleanup.

Typically you create a new file, and atomically replace the old one by
changing the name of the new one.  If you run out of space creating
the new one, the old one never changes.  This is what I did in
java.util.prefs. It is crude but effective.

This is the last I will say on this subtopic; you may have the last
word if it pleases you.

          Josh

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Dawid Kurzyniec
In reply to this post by Gregg Wonderly-2
Gregg Wonderly wrote:

>
>
> Dawid Kurzyniec wrote:
>
>> Scenarios where I would use file locking are those where I would
>> synchronize access to shared files stored in the very same filesystem.
>> For instance, suppose I am developing an e-mail client that caches
>> messages in ${user.home}/Mail. Then I need some means of protecting data
>> from mangling when the user launches two clients simultaneously. The
>> same goes if I am developing an e-mail server, since the user may open
>> multiple sessions simultaneously, which can e.g. move/delete files on
>> the server. Or, suppose I want to make sure that only a single instance
>> of an executable can be running on a host (e.g. if it is a system-level
>> service). These are well-known (for decades) and legit use cases for
>> file locking, in which I would be out of my mind to deploy a distributed
>> transaction manager.
>
>
> In many of these cases I would use an accessor service, today, instead
> of a distributed locking mechanism.  This would allow a single place
> to be in control of access, and multithreading issues could be
> controlled exactly in the place where that control was needed. (...)

In the scenarios I outlined, accessor service would be exposed to
exactly same issues, and I don't see how it is supposed to help. When do
you start the accessor service? Who starts it? How do you make sure that
you only have one instance running? What do you do when it crashes? How
do you detect it crashed? In fact, in the IMAP server example in
particular, if you look from the perspective of IMAP clients, the server
can itself be considered an accessor service. After all, it allows
clients to indirectly access shared files stored on the server host.
Should I be implementing an accessor service using an accessor service?
Where does it stop?

At some level, "distributed system" ends and the "local system" begins,
and you simply have to assume that something is reliable - in this case,
the local file system. Otherwise, multiphase commit is not going to help
you either, because at some point it has to give green light to writing
data to a file system, and if the file system crashes just at this
moment, you end up with inconsistent data as well. You make Jini sound
as a magic bullet that can somehow solve reliability issues in
distributed systems which are known to be unsolvable without assumptions
that are in fact stronger than those needed for a file lock to work
reliably.

>
> Advisory file locking is straight forward to use.  When we used
> link(2) to create advisory locks in unix, we got into problems with
> stuck locks when processes crashed. (...)

Again, there is no way around that. At some level, stuff gets written to
the filesystem. If the accessor service, or a local resource manager
involved in a distributed transaction, crashes during a filesystem
operation, it leaves inconsistent data. If the data is left "locked",
you have a stale lock - similarly in behavior to Thread.suspend in Java.
But if you release the lock, you leave unprotected inconsistent data -
like Thread.stop does. In either case, you need some error-recovery
procedure; just restarting the crashed process won't suffice. At some
level, you simply have to deal with that, possibly resorting to human
intervention, because file systems are not transactional. File locks and
renaming files as a "commit" are as good as it gets.

> The technical argument is that the Jini transactional services do
> everything you need for a working distributed locking service.

And I claim that file locks are good for *non-distributed* shared
filesystem access, keeping in mind that all systems are non-distributed
at some level where they interact with the file system. Hence, again,
the conclusion that file locks and distributed transaction managers are
solutions to different problems :)

Side note. Download size of JRE 5.0: about 15 MB. Download size of Jini
starter kit: over 11 MB. Not a huge difference. And note that the
bulkiness of JRE is already considered a barrier to entry in some cases.

Regards,
Dawid

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2


Dawid Kurzyniec wrote:

>> Advisory file locking is straight forward to use.  When we used
>> link(2) to create advisory locks in unix, we got into problems with
>> stuck locks when processes crashed. (...)
>
> Again, there is no way around that. At some level, stuff gets written to
> the filesystem. If the accessor service, or a local resource manager
> involved in a distributed transaction, crashes during a filesystem
> operation, it leaves inconsistent data. If the data is left "locked",
> you have a stale lock - similarly in behavior to Thread.suspend in Java.
> But if you release the lock, you leave unprotected inconsistent data -
> like Thread.stop does. In either case, you need some error-recovery
> procedure; just restarting the crashed process won't suffice. At some
> level, you simply have to deal with that, possibly resorting to human
> intervention, because file systems are not transactional. File locks and
> renaming files as a "commit" are as good as it gets.

In the Jini transaction specification there is a section on recovery after crash.  What is described is that a
transaction participant is not supposed to respond to the commit request, until it has persisted enough information to
know how to fully recover to the state it was in prior to the crash.  If it votes commit, and crashes during the
activities of the commit, it has to be prepared to recover to a commited state.  If it refuses the commit, it has to be
prepared to roll back on restart.

The intermediate case where the transaction has not been voted on yet, requires some careful attention to details.
You need to keep a crash count, and increment it when you restart.  The API includes providing the crashcount to the
transaction manager so that when you rejoin the transaction, it can say, hey, you crashed, and we're already voting, or
hey you crashed, and we're still waiting to vote.  The transaction manager can then respond to your join request with an
appropriate response, and the application can handle that response accordingly.

It is this specific behavior, encapsulated into the API and the functionality of the transaction manager, that provides
the power to recover from a stuck lock by just restarting the failed process.  Rather than the actions being a
perpherial activity of the participant, as you describe above, it is an integral part of the API.  That's the stuff that
you don't have to reinvent, rediscover or otherwise stumble through.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2
In reply to this post by Joshua Bloch
Joshua Bloch wrote:
>>There are many parts of Jini which you don't need for such things.
>>
>>>java.util.concurrency.locks.Lock has 6 methods.
>>
>>I understand this thought, but it in fact uses the whole JVM implementation which is many more lines of code than Jini.
 >
> That is utterly irrelevant.  Who cares how many lines of code there
> are in the JVM *implementation*?  We are talking about the conceptual
> weight of an interface.  The target audience for this class already
> knows the java.util.concurrent.locks.Lock *interface*, so the marginal
> conceptual weight of this implementation is pretty close to zero.  The
> audience does not know Jini, and the marginal conceptual weight to
> learn it is large.

I guess I'm missing the irrelevancy when you suggesting something based on NFS that would require learning about NFS
semantics, installing an NFS based filesystem implementation etc.  There are barriers everywhere Josh.  I'm not trying
to undermine your idea, I'm trying to suggest that there are tools that might make it easier.

>>One example might be that you ran out of space on the file system.  Half of the data made it to disk, the other half is
>>still pending, or lost.  But, the file system is actively aquiring new space, and you'd really like to let the failed
>>writer finish its job, before letting other writers put their data in there.  However, at some point, you want progress
>>to be made.  If the releasing process actually exited when the write failed because it has a bug, you could use a
>>distributed lease to finally expire the access lock unconditionally and move on with some type of cleanup.
>
> Typically you create a new file, and atomically replace the old one by
> changing the name of the new one.  If you run out of space creating
> the new one, the old one never changes.  This is what I did in
> java.util.prefs. It is crude but effective.

This is a pretty typical mechanism for small files.  But large mailbox files as was Dawid's example, would typically not
be replicated as a safety measure.  There are variations on every theme.  And points and counter points can be brought
up.  I'm trying to share my experiences.   I'm sorry I'm such a lousy communicator.

> This is the last I will say on this subtopic; you may have the last
> word if it pleases you.

I am not trying to gain pleasure here.  I am sorry you feel attacked or otherwise abused by my comments.  Please accept
my apologies.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Dawid Kurzyniec
In reply to this post by Gregg Wonderly-2
Gregg Wonderly wrote:

>
>
> Dawid Kurzyniec wrote:
>
>>> Advisory file locking is straight forward to use.  When we used
>>> link(2) to create advisory locks in unix, we got into problems with
>>> stuck locks when processes crashed. (...)
>>
>>
>> Again, there is no way around that. At some level, stuff gets written
>> to the filesystem. If the accessor service, or a local resource
>> manager involved in a distributed transaction, crashes during a
>> filesystem operation, it leaves inconsistent data. If the data is
>> left "locked", you have a stale lock - similarly in behavior to
>> Thread.suspend in Java. But if you release the lock, you leave
>> unprotected inconsistent data - like Thread.stop does. In either
>> case, you need some error-recovery procedure; just restarting the
>> crashed process won't suffice. At some level, you simply have to deal
>> with that, possibly resorting to human intervention, because file
>> systems are not transactional. File locks and renaming files as a
>> "commit" are as good as it gets.
>
>
> In the Jini transaction specification there is a section on recovery
> after crash.  What is described is that a transaction participant is
> not supposed to respond to the commit request, until it has persisted
> enough information to know how to fully recover to the state it was in
> prior to the crash.  If it votes commit, and crashes during the
> activities of the commit, it has to be prepared to recover to a
> commited state.  If it refuses the commit, it has to be prepared to
> roll back on restart.
>
> The intermediate case where the transaction has not been voted on yet,
> requires some careful attention to details.
> You need to keep a crash count, and increment it when you restart.  
> The API includes providing the crashcount to the transaction manager
> so that when you rejoin the transaction, it can say, hey, you crashed,
> and we're already voting, or hey you crashed, and we're still waiting
> to vote.  The transaction manager can then respond to your join
> request with an appropriate response, and the application can handle
> that response accordingly.
>
This is very interesting, but doesn't change the picture much. All it
says is that a process can be able to restore its persistent storage to
a consistent state when restarted after crash, by utilizing the fact
that filesystem operations are idempotent. However, the Jini transaction
API does not help a squad in achieving that. It makes state repair after
crash a sole responsibility of a transaction participant. What's more,
the whole scheme assumes that there is a reliable mechanism for
detecting crashes and restarting processes. The "mechanism" can be a
sysadmin, or yet another process, but that process can crash too, so
eventually, the human supervision is needed.

> It is this specific behavior, encapsulated into the API and the
> functionality of the transaction manager, that provides the power to
> recover from a stuck lock by just restarting the failed process.  
> Rather than the actions being a perpherial activity of the
> participant, as you describe above, it is an integral part of the
> API.  That's the stuff that you don't have to reinvent, rediscover or
> otherwise stumble through.
>
As I said above, the application programmer has to implement the
recovery herself in both cases, and the code would look exactly the
same. (And what "provides the power" is not a transaction manager but
the filesystem idempotency). The only reason I might want to adhere to a
distributed transaction API when coding that is if I was making it a
part of a distributed transaction. I still can't see how it would do me
any good, and even how to go about it, if I was implementing a singleton
system service or a mail client. The TX API is about coordinating
changes of private (persistent) states between distributed collaborants.
File locking is about synchronizing access to a shared, common state.
Where's the connection?...

Regards,
Dawid

_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Reply | Threaded
Open this post in threaded view
|

Re: A new Lock implementation: FileLock

Gregg Wonderly-2


Dawid Kurzyniec wrote:
> This is very interesting, but doesn't change the picture much. All it
> says is that a process can be able to restore its persistent storage to
> a consistent state when restarted after crash, by utilizing the fact
> that filesystem operations are idempotent.

I said persistent, not stored on a filesystem.  It might be a filesystem, but it might be something else.

 > However, the Jini transaction
> API does not help a squad in achieving that. It makes state repair after
> crash a sole responsibility of a transaction participant.

Yes, but because it's already part of the system design, users don't have to invent it.  So, it's a value added by the
system.  That's what I am trying to make the point of.  Not that it waters the plants and lets the dog out.  It already
does all the software things that you'd need the software to do.

>> It is this specific behavior, encapsulated into the API and the
>> functionality of the transaction manager, that provides the power to
>> recover from a stuck lock by just restarting the failed process.  
>> Rather than the actions being a perpherial activity of the
>> participant, as you describe above, it is an integral part of the
>> API.  That's the stuff that you don't have to reinvent, rediscover or
>> otherwise stumble through.
>>
> As I said above, the application programmer has to implement the
> recovery herself in both cases, and the code would look exactly the
> same.

For any transactional system, yes you'd have to do all of this stuff.  The point is that its already described,
documented and designed so that you just have to implement the pieces.  That's the value added.  The fact that there is
a public API on top is an additional value added.  Once your code knows how to use the Jini transaction manager service
interface, you can use any such compliant manager that might come into existance right?

 > (And what "provides the power" is not a transaction manager but
> the filesystem idempotency). The only reason I might want to adhere to a
> distributed transaction API when coding that is if I was making it a
> part of a distributed transaction. I still can't see how it would do me
> any good, and even how to go about it, if I was implementing a singleton
> system service or a mail client. The TX API is about coordinating
> changes of private (persistent) states between distributed collaborants.
> File locking is about synchronizing access to a shared, common state.
> Where's the connection?...

A lock's state is always transactionally managed.  You decide that you are ready for the lock to be locked or unlocked
at specific places.  try { } finally {} is one such fine grained transactional approach where a particular outcome is
guarenteed without other intervening issues, to proceed to the desired conclusion.  So, when you share a transaction,
you can use the commit vote as a stepping point.  I would implement a distributed lock with transactions using this
algorithm

TransactionParticipant p = new MyLocalParticipant();
Transaction.Created mytrans = trmgr.create(Lease.FOREVER);
LeaseRenewalManager lrm = new LeaseRenewalManager();

MyLeaseListener leaselis = new MyLeaseListener( mytrans );
// we pass a lease listener here that might can help mitigate the
// failure of the transaction manager.  When it detects the lease
// state changing, then it can do interesting things.
lrm.renewFor( mytrans.lease, Lease.FOREVER, leaselis );

DistributedLock lock = srvr.getLockAccess("TheWellKnownLockName");
Transaction otherTrans;

while( true ) {
        if( leaselis.transactionValid() == false ) {
                doSomethingInteresting();
        }

        // Attempt the lock with our transaction
        otherTrans = lock.testAndSet( mytrans );

        // Check if we got the lock, or someone else did
        if( otherTrans.equals(mytrans) == false ) {
                // Someone elses lock, join their transaction
                // and wait for it to complete.
                try {
                        otherTrans.join( p );
                        // if we care about the outcome, then there
                        // is some logic that goes here to manage what
                        // happens next.  If we don't care about
                        // the outcome, then we can just wait for
                        // some type of completion and continue.
                        p.waitForCompleted();
                } catch( IOException ex ) {
                        logException(ex);
                }
        } else {
                // we got the lock, do our work.
                doLockWork();

                // Now, commit the transaction and release the lock.
                try {
                        mytrans.commit();
                } catch( RemoteException ex ) {
                        logException(ex);
                } finally {
                        lrm.cancel(mytrans.lease);
                        // Pass in the owning transaction to make sure we release only
                        // the correct lock, not another that occured because of network
                        // segmentation or other bugs.
                        lock.release( mytrans );
                }
                break;
        }
}

I think this is a familar algorithm.  The issue is that there are some needs for handling RemoteExceptions and some
other related issues that make the API feel heavier.  But, it's the same logic with the same ordering and outcome
potentials.

My view is that we need to research how to encapsulate all this knowledge into less work for the user.  I think that
would be a much better choice than having 5 variations for 5 different weights of appication needs.  A common, singular
API with service provider plugability and other powerful mechanisms allows a single focus on getting work done instead
of reinventing all the pieces that are needed for each type of application.

I've always found it easier to dummy out the activities of an all encompassing API then try and figure out how to expand
the capabilities of a limited API to do more than it was designed to to.

A simple example might help you understand my point.  Look at what had to happen with JMX in order to implement
remoting.  It was designed originally for no-remoting from the point of view of not talking RMI use into account in the
set of thrown exceptions on all methods.

To solve the problem, the original MBeanServer interface was defined to extend the MBeanServerConnection super interface
that provided remoting.  This was a fairly simple mechanism.  But, it required all JMX users that wanted remoting to
change source code to use MBeanServerConnection instead of MBeanServer everywhere.  And then, they now had to deal with
IOException everywhere.  That's a big impact on a lot of applications.

This is the type of thing that I'm trying to convey.  You can always start simple, but when you're done, is it still
simple?  Hopefully it is, but there's a very small class of interprocess issues that are simple to solve.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
[hidden email]
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest