Lazy ST vs concurrency

Reid Barton rwbarton at
Mon Jan 30 18:50:29 UTC 2017

I wrote a lazy ST microbenchmark ( that uses
nothing but lazy ST monad operations in the inner loop. With various
caveats, it took around 3 times as long to run under +RTS -N2 after
applying The biggest caveat is
that the cost of the `threadPaused` in `noDuplicate#` seems to be
potentially proportional to the thread's stack depth, and I'm not sure
how representative my microbenchmark is in that regard.

I'm actually surprised the `noDuplicate#` version isn't an order of
magnitude or so slower than that. Still, a 3x factor is a large price
to pay. I don't yet understand what's going on here clearly enough to
be sure that the `noDuplicate#` is necessary, or that we can't
implement `noDuplicate#` more cheaply in the common case of no
contention. My feeling is that if it turns out that we can implement
the correct behavior cheaply, then it will be better to have left it
broken for a little while than to first have a correct but slow
implementation and then later replaced it with a correct and fast
implementation. The latter is disruptive to two groups of people,
those who are affected by the bug and also those who cannot afford to
have their lazy ST code run 3 times slower; of which the former group
is affected already, and we can advertise the existence of the bug
until we have a workable solution. So I'm reluctant to go down this
`noDuplicate#` path until we have exhausted our other options.

In an ideal world with no users, it would be better to start with
correct but slow, of course.

Reid Barton

On Mon, Jan 30, 2017 at 11:18 AM, David Feuer <david at> wrote:
> I forgot to CC ghc-devs the first time, so here's another copy.
> I was working on #11760 this weekend, which has to do with concurrency
> breaking lazy ST. I came up with what I thought was a pretty decent solution (
> ). Simon Peyton Jones, however, is quite
> unhappy about the idea of sticking this weird unsafePerformIO-like code
> (noDup, which I originally implemented as (unsafePerformIO . evaluate), but
> which he finds ugly regardless of the details) into fmap and (>>=).  He's also
> concerned that the noDuplicate# applications will kill performance in the
> multi-threaded case, and suggests he would rather leave lazy ST broken, or
> even remove it altogether, than use a fix that will make it slow sometimes,
> particularly since there haven't been a lot of reports of problems in the
> wild.
> My view is that leaving it broken, even if it only causes trouble
> occasionally, is simply not an option. If users can't rely on it to always
> give correct answers, then it's effectively useless. And for the sake of
> backwards compatibility, I think it's a lot better to keep it around, even if
> it runs slowly multithreaded, than to remove it altogether.
> Note to Simon PJ: Yes, it's ugly to stick that noDup in there. But lazy ST has
> always been a bit of deep magic. You can't *really* carry a moment of time
> around in your pocket and make its history happen only if necessary. We can
> make it work in GHC because its execution model is entirely based around graph
> reduction, so evaluation is capable of driving execution. Whereas lazy IO is
> extremely tricky because it causes effects observable in the real world, lazy
> ST is only *moderately* tricky, causing effects that we have to make sure
> don't lead to weird interactions between threads. I don't think it's terribly
> surprising that it needs to do a few more weird things to work properly.
> David
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at

More information about the ghc-devs mailing list