[Haskell-cafe] ANNOUNCE: iterIO-0.1 - iteratee-based IO with pipe operators

Mon May 16 21:31:37 CEST 2011

At Mon, 16 May 2011 10:56:02 +0100,
Simon Marlow wrote:
>
> Yes, it's not actually documented as far as I know, and we should fix 
> that.  But if you think about it, sequential consistency is really the 
> only sensible policy: suppose one processor creates a heap object and 
> writes a reference to it in the IORef, then another processor reads the 
> IORef.  The writes that created the heap object must be visible to the 
> second processor, otherwise it will encounter uninitialised memory and 
> crash.  So sequential consistency is necessary to ensure concurrent 
> programs can't crash.
> 
> Now perhaps it's possible to have a relaxed memory model that provides
> the no-crashes guarantee but still allows IORef writes to be reordered
> (e.g. some kind of causal consistency).  That might be important if
> there is some processor arcitecture that provides that memory model, but
> as far as I know there isn't.

Actually, in your heap object example, it sounds like you only really
care about preserving program order, rather than write atomicity.
Thus, you can get away with less-than-sequential consistency and not
crash.

The x86 is an example of a relaxed memory model that provides the
no-crashes guarantee you are talking about.  Specifically, the x86
deviates from sequential consistency in two ways

  1. A load can finish before an earlier store to a different memory
     location.  [intel, Sec. 8.2.3.4]

  2. A thread can read its own writes early. [intel, 8.2.3.5]

  [Section references are to the intel architecture manual, vol 3a:
   http://www.intel.com/Assets/PDF/manual/253668.pdf]

One could imagine an implementation of IORefs that relies on the fact
that pointer writes are atomic and that program order is preserved to
avoid mutex overhead for most calls.  E.g.:

  struct IORef {
    spinlock_t lock;     /* Only ever used by atomicModifyIORef */
    HaskellValue *val;   /* Updated atomically because pointer-sized
                            writes are atomic */
  };

  HaskellValue *
  readIORef (struct IORef *ref)
  {
    return ref->val;
  }

  void
  writeIORef (struct IORef *ref, HaskellValue *val)
  {
    /* Note that if *val was initialized in the same thread, then by
     * the time another CPU sees ref->val, it will also see the
     * correct contents of *ref->val, because stores are seen in a
     * consistent order by other processors [intel, Sec. 8.2.3.7].
     *
     * If *val was initialized in a different thread, then since this
     * thread has seen it, other threads will too, because x86
     * guarantees stores are transitively visible [intel, Sec. 8.2.3.6].
     */
    ref->val = val;
  }

  /* modifyIORef is built out of readIORef and writeIORef */

  HaskellValue *
  atomicModifyIORef (Struct IORef *ref, HaskellFunction *f)
  {
    HaskellValue *result;
    spinlock_acquire (&ref->lock);

    result = modifyIORef (ref, f);

    spinlock_release (&ref->lock);
    return result;
  }

This is actually how I assumed IORefs worked.  But then consider the
following program:

  maybePrint :: IORef Bool -> IORef Bool -> IO ()
  maybePrint myRef yourRef = do
    writeIORef myRef True
    yourVal <- readIORef yourRef
    unless yourVal $ putStrLn "critical section"

  main :: IO ()
  main = do
    r1 <- newIORef False
    r2 <- newIORef False
    forkOS $ maybePrint r1 r2
    forkOS $ maybePrint r2 r1
    threadDelay 1000000

Under sequential consistency, the string "critical section" should be
output at most once.  However, with the above IORef implementation on
x86, since a read can finish before a write to a different location,
both threads might see False for the value of yourVal.

To prevent this deviation from sequential consistency, you would need
to do something like stick an MFENCE instruction at the end of
writeIORef, and that would slow down the common case where you don't
care about sequential consistency.  In fact, I would argue that if you
care about S.C., you should either be using atomicModifyIORef or
MVars.

Can you explain what actually happens inside the real IORef
implementation?

As an aside, these days one sees a lot of hand-wringing over the fact
that CPU clock rates have been flat for a while and the only way to
get more performance is through parallelism.  "How are we going to
teach programmers to write concurrent code when it's so hard to write
and debug?" I've heard numerous people ask.

Haskell could be a major step in the right direction, since in the
absence of variables, it's impossible to have data races.  (You can
still have deadlock and other kinds of race condition, such as the one
in maybePrint above, if you had my definition of IORef, but data races
are by far the most pernicious concurrency problems.)  Of course, the
key to making Haskell useful in a parallel setting is that things like
the memory model have to be fully specified...

Thanks,
David