[Haskell-cafe] Re: Channel9 Interview: Software Composability and theFu ture of Languages

Sun Jan 28 18:07:23 EST 2007

Benjamin Franksen wrote:
> Chris Kuklewicz wrote:
>> Aside on utterly useful proofs: When you write concurrent programs you
>> want an API with strong and useful guarantees so you can avoid deadlocks
>> and starvation and conflicting data read/writes.  Designing an using such
>> an API is a reasoning exercise identical to creating proofs.  Some systems
>> makes this possible and even easy (such as using STM).
> 
> I have to -- partly -- disagree with your last statement. It is possible
> that I missed something important but as I see it, STM has by construction
> the disadvantage that absence of starvation is quite hard (if not
> impossible) to prove. A memory transaction must be re-started from the
> beginning if it finds (when trying to commit) another task has modified one
> of the TVars it depends on. This might mean that a long transaction may in
> fact /never/ commit if it gets interrupted often enough by short
> transactions which modify one of the TVars the longer transaction depends
> upon.

True.

The STM API ensures the program will be deadlock free, the mutable state is
consistent, and that progress will always be made (i.e. _some block_ will commit).

The current STM implementation does not otherwise ensure fairness or avoid
starvation.  It is clever enough to put retrying blocks to sleep until one of
the TVars they read is changed.  But it does not have the single-wake up FIFO
fairness of MVars.

> IIRC this problem is mentioned in the original STM paper only in passing
> ("Starvation is surely a possibility" or some such comment). I would be
> very interested to hear if there has been any investigation of the
> consequences with regard to proving progress guarantees for programs that
> use STM.

The STM API ensures progress for the program as a whole (i.e. commits).  If a
block is failed because of a conflict this is because another block has
committed.  If all the threads of the program call retry then the program is
stuck, but that is not STM's fault, and the STM scheduler will kill the program
in this case rather than hang.

> My current take on this is that there might be possibilities avoid this kind
> of starvation by appropriate scheduling. One idea is to assign time slices
> dynamically, e.g. to double it whenever a task must do a rollback instead
> of a commit (due to externally modified TVars), and to reset the time slice
> to the default on (successful) commit or exception.

With the new GHC using multiple CPU's this would not prevent a fast block from
running & committing on a different CPU and causing the long block to retry.

> Another possibility is
> to introduce dynamic task priorities and to increase priority on each
> unsuccessful commit.

> (Unfortunately I have no proof that such measures
> would remove the danger of starvation, they merely seem plausible avenues
> for further research. I also don't know whether the current GHC runtime
> system already implements such heuristics.)
> 
> Cheers
> Ben

What you need to "increase fairness/priority" for a block is to run that block
until it tries to commit while keeping other blocks that would interfere from
committing.

The scheduler could take the runtime of a failed block into account when
re-attempting it.  It could lock out other blocks that might interfere with a
very-long-delayed much-re-attempted-block until it has had a change to commit.
(Since the scheduler knows which TVars the problem block has needed in the past
any independent blocks can still run).  To keep the progress guarantee: if the
problem block takes a very very long time it must be timed out and another block
be allowed to commit.

This partitioning into "fast" vs "slow" blocks is merely based on the
observations of the runtime scheduler.