[Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Mon May 16 23:31:14 CEST 2011

On 16/05/11 20:31, dm-list-haskell-cafe at scs.stanford.edu wrote:
> At Mon, 16 May 2011 10:56:02 +0100,
> Simon Marlow wrote:
>>
>> Yes, it's not actually documented as far as I know, and we should fix
>> that.  But if you think about it, sequential consistency is really the
>> only sensible policy: suppose one processor creates a heap object and
>> writes a reference to it in the IORef, then another processor reads the
>> IORef.  The writes that created the heap object must be visible to the
>> second processor, otherwise it will encounter uninitialised memory and
>> crash.  So sequential consistency is necessary to ensure concurrent
>> programs can't crash.
>>
>> Now perhaps it's possible to have a relaxed memory model that provides
>> the no-crashes guarantee but still allows IORef writes to be reordered
>> (e.g. some kind of causal consistency).  That might be important if
>> there is some processor arcitecture that provides that memory model, but
>> as far as I know there isn't.
>
> Actually, in your heap object example, it sounds like you only really
> care about preserving program order, rather than write atomicity.
> Thus, you can get away with less-than-sequential consistency and not
> crash.
>
> The x86 is an example of a relaxed memory model that provides the
> no-crashes guarantee you are talking about.  Specifically, the x86
> deviates from sequential consistency in two ways
>
>    1. A load can finish before an earlier store to a different memory
>       location.  [intel, Sec. 8.2.3.4]
>
>    2. A thread can read its own writes early. [intel, 8.2.3.5]
>
>    [Section references are to the intel architecture manual, vol 3a:
>     http://www.intel.com/Assets/PDF/manual/253668.pdf]
>
> One could imagine an implementation of IORefs that relies on the fact
> that pointer writes are atomic and that program order is preserved to
> avoid mutex overhead for most calls.  E.g.:
>
>    struct IORef {
>      spinlock_t lock;     /* Only ever used by atomicModifyIORef */
>      HaskellValue *val;   /* Updated atomically because pointer-sized
>                              writes are atomic */
>    };
>
>    HaskellValue *
>    readIORef (struct IORef *ref)
>    {
>      return ref->val;
>    }
>
>    void
>    writeIORef (struct IORef *ref, HaskellValue *val)
>    {
>      /* Note that if *val was initialized in the same thread, then by
>       * the time another CPU sees ref->val, it will also see the
>       * correct contents of *ref->val, because stores are seen in a
>       * consistent order by other processors [intel, Sec. 8.2.3.7].
>       *
>       * If *val was initialized in a different thread, then since this
>       * thread has seen it, other threads will too, because x86
>       * guarantees stores are transitively visible [intel, Sec. 8.2.3.6].
>       */
>      ref->val = val;
>    }
>
>    /* modifyIORef is built out of readIORef and writeIORef */
>
>    HaskellValue *
>    atomicModifyIORef (Struct IORef *ref, HaskellFunction *f)
>    {
>      HaskellValue *result;
>      spinlock_acquire (&ref->lock);
>
>      result = modifyIORef (ref, f);
>
>      spinlock_release (&ref->lock);
>      return result;
>    }
>
> This is actually how I assumed IORefs worked.

Right, that is how IORefs work. (well, atomicModifyIORef is a bit 
different, but the differences aren't important here)

> But then consider the
> following program:
>
>    maybePrint :: IORef Bool ->  IORef Bool ->  IO ()
>    maybePrint myRef yourRef = do
>      writeIORef myRef True
>      yourVal<- readIORef yourRef
>      unless yourVal $ putStrLn "critical section"
>
>    main :: IO ()
>    main = do
>      r1<- newIORef False
>      r2<- newIORef False
>      forkOS $ maybePrint r1 r2
>      forkOS $ maybePrint r2 r1
>      threadDelay 1000000
>
> Under sequential consistency, the string "critical section" should be
> output at most once.  However, with the above IORef implementation on
> x86, since a read can finish before a write to a different location,
> both threads might see False for the value of yourVal.
>
> To prevent this deviation from sequential consistency, you would need
> to do something like stick an MFENCE instruction at the end of
> writeIORef, and that would slow down the common case where you don't
> care about sequential consistency.  In fact, I would argue that if you
> care about S.C., you should either be using atomicModifyIORef or
> MVars.

Good example - so it looks like we don't get full sequential consistency 
on x86 (actually I'd been thinking only about write ordering and 
forgetting that reads could be reordered around writes).

But that's bad because it means Haskell has a memory model, and we have 
to say what it is, or at least say that ordering is undefined.

In practice I don't think anyone actually does use IORef in this way. 
Typically you need at least one atomicModifyIORef somewhere, and that 
acts as a barrier.

> As an aside, these days one sees a lot of hand-wringing over the fact
> that CPU clock rates have been flat for a while and the only way to
> get more performance is through parallelism.  "How are we going to
> teach programmers to write concurrent code when it's so hard to write
> and debug?" I've heard numerous people ask.
>
> Haskell could be a major step in the right direction, since in the
> absence of variables, it's impossible to have data races.  (You can
> still have deadlock and other kinds of race condition, such as the one
> in maybePrint above, if you had my definition of IORef, but data races
> are by far the most pernicious concurrency problems.)  Of course, the
> key to making Haskell useful in a parallel setting is that things like
> the memory model have to be fully specified...

Hmm, as it happens I have rather a lot to say on this particular matter! 
  But I've never heard anyone claim that a prerequisite to Haskell being 
useful as a parallel programming language is a well-defined memory 
model.  I think there's a couple of reasons for that:

   - deterministic parallel programming models (e.g. Strategies,
     monad-par) don't care about memory models.  These are the
     first port of call for parallel programming.

   - If you have to use concurrency, then none of MVars,
     atomicModifyIORef or STM care about memory models either.

So the memory model only becomes visible when you use concurrency with 
shared IORefs (without atomicModifyIORef) or bare peek/poke, which is 
pretty rare and easily avoided.

Cheers,
	Simon