[Haskell-cafe] what is f=f (not) doing ?

Sun Sep 23 21:05:08 EDT 2007

Miguel Mitrofanov wrote:
 >> A deadlock happens whenever two (or more) threads are blocked on each
 >> other.  Deadlocks can be extremely hard to detect, especially if they
 >> occur intermittently.
 >
 > Isn't that so much different from garbage collection? Replace
 > "thread" with "chunk of data", and "waits for" with "has a pointer
 > to" and these two problems look very similar. And we all know there
 > ARE efficient garbage collectors.

Actually, I think the two problems look /extremely/ similar.

When doing garbage collection, one BIG problem is determining what's
really a pointer and what's just data.  There are conservative garbage
collectors that assume just about anything can be a pointer, and then
there are precise garbage collectors that use knowledge of the layout
of "chunks of data" to identify the pointers to follow.

All garbage collectors have one thing in common, though.  They all
depend on being able to identify, for each chunk of data, what else it
has pointers to.  If all else fails, a conservative GC can simply
assume that everything is potentially a pointer and try to follow it.

Now, a deadlock detector would need to be able to identify, for each
blocked thread, what that thread is waiting for.  The OS already
tracks this information and uses it to determine when a blocked thread
becomes un-blocked and ready to run.

Now, if we could make sure that /all/ locks go through the OS, then
the OS would be able to build a complete dependency table.  When a
deadlock is about to occur, the OS would be able to detect it
immediately.

Back to reality: Not all locks go through the OS.

Suppose I'm using fibers.  That means I have one OS thread running
many routines at the same time, and I'm doing my own cooperative
multitasking.  GHC does this.  The OS has no idea what's going on.

Even if I had an OS that could track all locks, I can still get into
deadlock situations.

Suppose I have a programming error that causes two of my threads to
deadlock.  Furthermore, suppose that when the deadlock is detected, I
make both threads roll back and start again with the same pre-
conditions.  Now they will go back into the same deadlock.  This cycle
will repeat indefinitely, and it's beyond the OS's ability to detect.

Regarding deadlock detection, I think the real questions to ask are
 (1) Do deadlocks occur often enough for us to care?
 (2) When they do occur, are they severe enough for us to care?

If the answers are both "no", then deadlock detection is not worth the
trouble of implementation.  My opinion is that for typical commercial
end-user software, the answers are both "no".

For "critical" computing systems, like flight computers and life
support systems, the answers are obviously "yes"; a deadlock that
occurs once per million years is still a problem if it can bring down
a multi-million dollar aircraft with hundreds of passengers on board.

For these types of systems, the developers tend to spend extra
development resources to try to design fool-proof systems.  In some
cases, they may even go to heroic lengths to /prove/ that their
software is correct.

It /is/ possible to design a system that cannot deadlock.  It turns
out that one of the necessary conditions for deadlock is the hold and
wait condition.  If a process that's already holding resources is
/not/ allowed to request additional resources, then deadlocks are
impossible.  If we really, absolutely don't want any deadlocks, then
we can design a system in which a process that needs a resource is
required to release all its resources and then re-allocate its updated
list of resources.

-- Ron