Lazy ST vs concurrency
marlowsd at gmail.com
Tue Jan 31 08:59:01 UTC 2017
On 30 January 2017 at 22:56, Simon Peyton Jones <simonpj at microsoft.com>
> We don’t want to do this on a per-module basis do we, as
> -fatomic-eager-blackholing would suggest. Rather, on per-thunk basis, no?
> Which thunks, precisely? I think perhaps *precisely thunks one of whose
> free variables has type (Sttate# s) for some s.* These are thunks that
> consume a state token, and must do so no more than once.
If we could identify exactly the thunks we wanted to be atomic, then yes,
that would be better than a whole-module solution. However I'm not sure
how to do that - doing it on the basis of a free variable with State# type
doesn't work if the State# is buried in a data structure or a function
closure, for instance.
> If entering such thunks was atomic, could we kill off noDuplicate#?
> I still don’t understand exactly what noDuplicate# does, what problem it
> solves, and how the problem it solves relates to this LazyST problem.
Back in our "Haskell on a Shared Memory Multiprocessor" paper (
http://simonmar.github.io/bib/papers/multiproc.pdf) we described a scheme
to try to avoid duplication of work when multiple cores evaluate the same
thunk. This is normally applied lazily, because it involves walking the
stack and atomically black-holing thunks pointed to by update frames. The
noDuplicate# primop just invokes the stack walk immediately; the idea is to
try to prevent multiple threads from evaluating a thunk containing
It's expensive. It's also not foolproof, because if you already happened
to create two copies of the unsafePerformIO thunk then noDuplicate# can't
help. I've never really liked it for these reasons, but I don't know a
better way. We have unsafeDupablePerformIO that doesn't call noDuplicate#,
and the programmer can use when the unsafePerformIO can safely be executed
> We need some kind of fix for 8.2. Simon what do you suggest?
David's current fix would be OK (along with a clear notice in the release
notes etc. to note that the implementation got slower). I think
-fatomic-eager-blackholing might "fix" it with less overhead, though.
> eagerlyBlackhole :: a -> a
is likely to be unreliable I think. We lack the control in the source
language to tie it to a particular thunk.
> *From:* Simon Marlow [mailto:marlowsd at gmail.com]
> *Sent:* 30 January 2017 21:51
> *To:* David Feuer <david at well-typed.com>
> *Cc:* Simon Peyton Jones <simonpj at microsoft.com>; ghc-devs at haskell.org
> *Subject:* Re: Lazy ST vs concurrency
> On 30 January 2017 at 16:18, David Feuer <david at well-typed.com> wrote:
> I forgot to CC ghc-devs the first time, so here's another copy.
> I was working on #11760 this weekend, which has to do with concurrency
> breaking lazy ST. I came up with what I thought was a pretty decent
> solution (
> https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is
> unhappy about the idea of sticking this weird unsafePerformIO-like code
> (noDup, which I originally implemented as (unsafePerformIO . evaluate), but
> which he finds ugly regardless of the details) into fmap and (>>=). He's
> concerned that the noDuplicate# applications will kill performance in the
> multi-threaded case, and suggests he would rather leave lazy ST broken, or
> even remove it altogether, than use a fix that will make it slow sometimes,
> particularly since there haven't been a lot of reports of problems in the
> In a nutshell, I think we have to fix this despite the cost - the
> implementation is incorrect and unsafe.
> Unfortunately the mechanisms we have right now to fix it aren't ideal -
> noDuplicate# is a bigger hammer than we need. All we really need is some
> way to make a thunk atomic, it would require some special entry code to the
> thunk which did atomic eager-blackholing. Hmm, now that I think about it,
> perhaps we could just have a flag, -fatomic-eager-blackholing. We already
> do this for CAFs, incidentally. The idea is to compare-and-swap the
> blackhole info pointer into the thunk, and if we didn't win the race, just
> re-enter the thunk (which is now a blackhole). We already have the cmpxchg
> MachOp, so It shouldn't be more than a few lines in the code generator to
> implement it. It would be too expensive to do by default, but doing it
> just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.
> (I haven't really thought this through, just an idea off the top of my
> head, so there could well be something I'm overlooking here...)
> My view is that leaving it broken, even if it only causes trouble
> occasionally, is simply not an option. If users can't rely on it to always
> give correct answers, then it's effectively useless. And for the sake of
> backwards compatibility, I think it's a lot better to keep it around, even
> it runs slowly multithreaded, than to remove it altogether.
> Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST
> always been a bit of deep magic. You can't *really* carry a moment of time
> around in your pocket and make its history happen only if necessary. We can
> make it work in GHC because its execution model is entirely based around
> reduction, so evaluation is capable of driving execution. Whereas lazy IO
> extremely tricky because it causes effects observable in the real world,
> ST is only *moderately* tricky, causing effects that we have to make sure
> don't lead to weird interactions between threads. I don't think it's
> surprising that it needs to do a few more weird things to work properly.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs