Lazy ST vs concurrency

Wed Feb 1 09:01:44 UTC 2017

On 31 January 2017 at 10:02, Simon Peyton Jones <simonpj at microsoft.com>
wrote:

> Huh. You are right.  That’s horrible.
>

Horrible indeed!

>
>
> OK, here’s another idea.  Provide,
>
>             applyOnce# :: (a->b) -> a -> b
>
>
>
> which behaves like
>
>             applyOnce f x = f x
>
>
>
> but guarantees that any thunk  (applyOnce# f x) will be evaluated with
> atomic eager black-holing.
>
>
>
> \(s :: State# s) ->
>
>    let unsafePerformIO = \g -> g s
>
>         thunk = applyOnce# unsafePerformIO (\s -> ... )
>
>    in
>
>       ...
>
>
But what if GHC decided to add another thunk, e.g.

let
  thunk =
    let x = applyOnce# unsafePerformIO (\s -> ...)
    in x

now we need atomicity on both thunks, but there's no way to tell. (of
course GHC probably wouldn't do this particularly transformation, but there
are a whole range of other things that it might correctly do that would
subvert the programmer's intention to make a single atomic thunk).

noDuplicate# avoids this problem because it walks the whole stack, claiming
the whole chain of thunks currently under evaluation.  But this is a messy
solution, I don't like it either.

Cheers
Simon

> Of course this does not guarantee safety.  But I think it’d give a
> per-thunk way to specify it.
>
>
>
> Simon
>
>
>
> *From:* Simon Marlow [mailto:marlowsd at gmail.com]
> *Sent:* 31 January 2017 09:25
>
> *To:* Simon Peyton Jones <simonpj at microsoft.com>
> *Cc:* David Feuer <david at well-typed.com>; ghc-devs at haskell.org
> *Subject:* Re: Lazy ST vs concurrency
>
>
>
> On 31 January 2017 at 09:11, Simon Peyton Jones <simonpj at microsoft.com>
> wrote:
>
> If we could identify exactly the thunks we wanted to be atomic, then yes,
> that would be better than a whole-module solution.  However I'm not sure
> how to do that - doing it on the basis of a free variable with State# type
> doesn't work if the State# is buried in a data structure or a function
> closure, for instance.
>
>
>
> I disagree.  Having a free State# variable is precisely necessary and
> sufficient, I claim.  Can you provide a counter-example?
>
>
>
> Sure, what I had in mind is something like this, defining a local
> unsafePerformIO:
>
>
>
> \(s :: State# s) ->
>
>    let unsafePerformIO = \g -> g s
>
>         thunk = unsafePerformIO (\s -> ... )
>
>    in
>
>       ...
>
>
>
> and "thunk" doesn't have a free variable of type State#.
>
>
>
> Cheers
>
> Simon
>
>
>
>
>
> Informal proof:
>
> ·         The model is that a value of type (State# t) is a linear value
> that we mutate in-place.  We must not consume it twice.
>
> ·         Evaluating a thunk that has a free (State# t) variable is
> precisely “consuming” it.  So we should only do that once
>
>
>
>
>
> I think -fatomic-eager-blackholing might "fix" it with less overhead,
> though
>
>
>
> But precisely where would you have to use that flag?  Inlining could meant
> that the code appears anywhere!  Once we have the ability to
> atomically-blackhole a thunk, we can just use my criterion above, I claim.
>
>
>
> Stopgap story for 8.2.   I am far from convinced that putting
> unsafePerformIO in the impl of (>>=) for the ST monad will be correct; but
> if you tell me it is, and if it is surrounded with huge banners saying that
> this is the wrong solution, and pointing to a new ticket to fix it, then OK.
>
>
>
> Arguably this isn't all that urgent, given that it's been broken for 8
> years or so.
>
>
>
>
>
> Simon
>
>
>
> *From:* Simon Marlow [mailto:marlowsd at gmail.com]
> *Sent:* 31 January 2017 08:59
> *To:* Simon Peyton Jones <simonpj at microsoft.com>
> *Cc:* David Feuer <david at well-typed.com>; ghc-devs at haskell.org
>
>
> *Subject:* Re: Lazy ST vs concurrency
>
>
>
> On 30 January 2017 at 22:56, Simon Peyton Jones <simonpj at microsoft.com>
> wrote:
>
> We don’t want to do this on a per-module basis do we, as
> -fatomic-eager-blackholing would suggest.  Rather, on per-thunk basis, no?
> Which thunks, precisely?   I think perhaps *precisely thunks one of whose
> free variables has type (Sttate# s) for some s.*  These are thunks that
> consume a state token, and must do so no more than once.
>
>
>
> If we could identify exactly the thunks we wanted to be atomic, then yes,
> that would be better than a whole-module solution.  However I'm not sure
> how to do that - doing it on the basis of a free variable with State# type
> doesn't work if the State# is buried in a data structure or a function
> closure, for instance.
>
>
>
> If entering such thunks was atomic, could we kill off noDuplicate#?
>
>
>
> I still don’t understand exactly what noDuplicate# does, what problem it
> solves, and how the problem it solves relates to this LazyST problem.
>
>
>
> Back in our "Haskell on a Shared Memory Multiprocessor" paper (
> http://simonmar.github.io/bib/papers/multiproc.pdf
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsimonmar.github.io%2Fbib%2Fpapers%2Fmultiproc.pdf&data=02%7C01%7Csimonpj%40microsoft.com%7C49b93aee78394d54fcab08d449b76706%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636214499439419212&sdata=81aU2TCVDxdNFl7CIHd8GxWUUdmUn%2FdRO4bOi2ScpVw%3D&reserved=0>)
> we described a scheme to try to avoid duplication of work when multiple
> cores evaluate the same thunk.  This is normally applied lazily, because it
> involves walking the stack and atomically black-holing thunks pointed to by
> update frames.  The noDuplicate# primop just invokes the stack walk
> immediately; the idea is to try to prevent multiple threads from evaluating
> a thunk containing unsafePerformIO.
>
>
>
> It's expensive.  It's also not foolproof, because if you already happened
> to create two copies of the unsafePerformIO thunk then noDuplicate# can't
> help. I've never really liked it for these reasons, but I don't know a
> better way.  We have unsafeDupablePerformIO that doesn't call noDuplicate#,
> and the programmer can use when the unsafePerformIO can safely be executed
> multiple times.
>
>
>
>
>
> We need some kind of fix for 8.2.  Simon what do you suggest?
>
>
>
> David's current fix would be OK (along with a clear notice in the release
> notes etc. to note that the implementation got slower).  I think
> -fatomic-eager-blackholing might "fix" it with less overhead, though.
>
>
>
> Ben's suggestion:
>
>
>
> > eagerlyBlackhole :: a -> a
>
>
>
> is likely to be unreliable I think.  We lack the control in the source
> language to tie it to a particular thunk.
>
>
>
> Cheers
>
> Simon
>
>
>
>
>
> Simon
>
>
>
> *From:* Simon Marlow [mailto:marlowsd at gmail.com]
> *Sent:* 30 January 2017 21:51
> *To:* David Feuer <david at well-typed.com>
> *Cc:* Simon Peyton Jones <simonpj at microsoft.com>; ghc-devs at haskell.org
> *Subject:* Re: Lazy ST vs concurrency
>
>
>
> On 30 January 2017 at 16:18, David Feuer <david at well-typed.com> wrote:
>
> I forgot to CC ghc-devs the first time, so here's another copy.
>
>
> I was working on #11760 this weekend, which has to do with concurrency
> breaking lazy ST. I came up with what I thought was a pretty decent
> solution (
> https://phabricator.haskell.org/D3038 ). Simon Peyton Jones, however, is
> quite
> unhappy about the idea of sticking this weird unsafePerformIO-like code
> (noDup, which I originally implemented as (unsafePerformIO . evaluate), but
> which he finds ugly regardless of the details) into fmap and (>>=).  He's
> also
> concerned that the noDuplicate# applications will kill performance in the
> multi-threaded case, and suggests he would rather leave lazy ST broken, or
> even remove it altogether, than use a fix that will make it slow sometimes,
> particularly since there haven't been a lot of reports of problems in the
> wild.
>
>
>
> In a nutshell, I think we have to fix this despite the cost - the
> implementation is incorrect and unsafe.
>
>
>
> Unfortunately the mechanisms we have right now to fix it aren't ideal -
> noDuplicate# is a bigger hammer than we need.  All we really need is some
> way to make a thunk atomic, it would require some special entry code to the
> thunk which did atomic eager-blackholing.  Hmm, now that I think about it,
> perhaps we could just have a flag, -fatomic-eager-blackholing.  We already
> do this for CAFs, incidentally. The idea is to compare-and-swap the
> blackhole info pointer into the thunk, and if we didn't win the race, just
> re-enter the thunk (which is now a blackhole).  We already have the cmpxchg
> MachOp, so It shouldn't be more than a few lines in the code generator to
> implement it.  It would be too expensive to do by default, but doing it
> just for Control.Monad.ST.Lazy should be ok and would fix the unsafety.
>
>
>
> (I haven't really thought this through, just an idea off the top of my
> head, so there could well be something I'm overlooking here...)
>
>
>
> Cheers
>
> Simon
>
>
>
>
>
> My view is that leaving it broken, even if it only causes trouble
> occasionally, is simply not an option. If users can't rely on it to always
> give correct answers, then it's effectively useless. And for the sake of
> backwards compatibility, I think it's a lot better to keep it around, even
> if
> it runs slowly multithreaded, than to remove it altogether.
>
> Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST
> has
> always been a bit of deep magic. You can't *really* carry a moment of time
> around in your pocket and make its history happen only if necessary. We can
> make it work in GHC because its execution model is entirely based around
> graph
> reduction, so evaluation is capable of driving execution. Whereas lazy IO
> is
> extremely tricky because it causes effects observable in the real world,
> lazy
> ST is only *moderately* tricky, causing effects that we have to make sure
> don't lead to weird interactions between threads. I don't think it's
> terribly
> surprising that it needs to do a few more weird things to work properly.
>
> David
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20170201/2c2bda6d/attachment-0001.html>