Lazy ST vs concurrency

Tue Jan 31 08:59:01 UTC 2017

On 30 January 2017 at 22:56, Simon Peyton Jones <simonpj at microsoft.com>
wrote:

> We don’t want to do this on a per-module basis do we, as
> -fatomic-eager-blackholing would suggest.  Rather, on per-thunk basis, no?
> Which thunks, precisely?   I think perhaps *precisely thunks one of whose
> free variables has type (Sttate# s) for some s.*  These are thunks that
> consume a state token, and must do so no more than once.
>

If we could identify exactly the thunks we wanted to be atomic, then yes,
that would be better than a whole-module solution.  However I'm not sure
how to do that - doing it on the basis of a free variable with State# type
doesn't work if the State# is buried in a data structure or a function
closure, for instance.

> If entering such thunks was atomic, could we kill off noDuplicate#?
>
>
>
> I still don’t understand exactly what noDuplicate# does, what problem it
> solves, and how the problem it solves relates to this LazyST problem.
>
>
Back in our "Haskell on a Shared Memory Multiprocessor" paper (
http://simonmar.github.io/bib/papers/multiproc.pdf) we described a scheme
to try to avoid duplication of work when multiple cores evaluate the same
thunk.  This is normally applied lazily, because it involves walking the
stack and atomically black-holing thunks pointed to by update frames.  The
noDuplicate# primop just invokes the stack walk immediately; the idea is to
try to prevent multiple threads from evaluating a thunk containing
unsafePerformIO.

It's expensive.  It's also not foolproof, because if you already happened
to create two copies of the unsafePerformIO thunk then noDuplicate# can't
help. I've never really liked it for these reasons, but I don't know a
better way.  We have unsafeDupablePerformIO that doesn't call noDuplicate#,
and the programmer can use when the unsafePerformIO can safely be executed
multiple times.

>
>
> We need some kind of fix for 8.2.  Simon what do you suggest?
>

David's current fix would be OK (along with a clear notice in the release
notes etc. to note that the implementation got slower).  I think
-fatomic-eager-blackholing might "fix" it with less overhead, though.

Ben's suggestion:

> eagerlyBlackhole :: a -> a

is likely to be unreliable I think.  We lack the control in the source
language to tie it to a particular thunk.

Cheers
Simon

>
> Simon
>
>
>
> *From:* Simon Marlow [mailto:marlowsd at gmail.com]
> *Sent:* 30 January 2017 21:51
> *To:* David Feuer <david at well-typed.com>
> *Cc:* Simon Peyton Jones <simonpj at microsoft.com>; ghc-devs at haskell.org
> *Subject:* Re: Lazy ST vs concurrency
>
>
>
> On 30 January 2017 at 16:18, David Feuer <david at well-typed.com> wrote:
>
> I forgot to CC ghc-devs the first time, so here's another copy.
>
>
> I was working on #11760 this weekend, which has to do with concurrency
> breaking lazy ST. I came up with what I thought was a pretty decent
> solution (
> https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is
> quite
> unhappy about the idea of sticking this weird unsafePerformIO-like code
> (noDup, which I originally implemented as (unsafePerformIO . evaluate), but
> which he finds ugly regardless of the details) into fmap and (>>=).  He's
> also
> concerned that the noDuplicate# applications will kill performance in the
> multi-threaded case, and suggests he would rather leave lazy ST broken, or
> even remove it altogether, than use a fix that will make it slow sometimes,
> particularly since there haven't been a lot of reports of problems in the
> wild.
>
>
>
> In a nutshell, I think we have to fix this despite the cost - the
> implementation is incorrect and unsafe.
>
>
>
> Unfortunately the mechanisms we have right now to fix it aren't ideal -
> noDuplicate# is a bigger hammer than we need.  All we really need is some
> way to make a thunk atomic, it would require some special entry code to the
> thunk which did atomic eager-blackholing.  Hmm, now that I think about it,
> perhaps we could just have a flag, -fatomic-eager-blackholing.  We already
> do this for CAFs, incidentally. The idea is to compare-and-swap the
> blackhole info pointer into the thunk, and if we didn't win the race, just
> re-enter the thunk (which is now a blackhole).  We already have the cmpxchg
> MachOp, so It shouldn't be more than a few lines in the code generator to
> implement it.  It would be too expensive to do by default, but doing it
> just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.
>
>
>
> (I haven't really thought this through, just an idea off the top of my
> head, so there could well be something I'm overlooking here...)
>
>
>
> Cheers
>
> Simon
>
>
>
>
>
> My view is that leaving it broken, even if it only causes trouble
> occasionally, is simply not an option. If users can't rely on it to always
> give correct answers, then it's effectively useless. And for the sake of
> backwards compatibility, I think it's a lot better to keep it around, even
> if
> it runs slowly multithreaded, than to remove it altogether.
>
> Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST
> has
> always been a bit of deep magic. You can't *really* carry a moment of time
> around in your pocket and make its history happen only if necessary. We can
> make it work in GHC because its execution model is entirely based around
> graph
> reduction, so evaluation is capable of driving execution. Whereas lazy IO
> is
> extremely tricky because it causes effects observable in the real world,
> lazy
> ST is only *moderately* tricky, causing effects that we have to make sure
> don't lead to weird interactions between threads. I don't think it's
> terribly
> surprising that it needs to do a few more weird things to work properly.
>
> David
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170131/1067f9d3/attachment.html>