[GHC] #12620: Allow the user to prevent floating and CSE

Wed Oct 5 03:01:41 UTC 2016

#12620: Allow the user to prevent floating and CSE
-------------------------------------+-------------------------------------
        Reporter:  nomeata           |                Owner:
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #9520, #8457      |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by edsko):

 The more I think about this, the less convinced I am that a `nofloat` or
 even a local `noupdate` annotation really helps. The problem is: where do
 we put the annotation? The whole point of having such an annotation, as
 opposed to just disabling full laziness in the whole module, is to have
 more fine grained control over where full laziness applies and where it
 doesn't. This was easy in the example that this ticket started with

 {{{#!hs
 enum xs = zip (nofloat [1..]) xs
 }}}

 but it's far less obvious in larger examples. For example, consider the
 definition of a conduit that implements the HTTP protocol (Michael
 Snoyman's http-conduit package), or a conduit that does constant space
 type inference for large JSON documents (an example from the code base I
 am working on). Now how do we know what in these definitions to mark as
 `nofloat`? If we get it wrong, then full laziness might float something
 else out that we weren't expecting, and we might once again end up with a
 difficult to debug space leak. The only really workable solution would be
 to mark the whole body as `nofloat`, but now we've lost the advantage of
 fine granularity. I'm guessing @tomjaguarpaw will say at this point "see!
 the problem is full laziness itself" and to be honest, I'm starting to get
 more and more convinced by that point of view.

 However, I still believe that there is an alternative by means of the
 NOUPDATE annotation. But, having thought about it more, I don't think
 annotating the ''constructors'' is the right approach. @simonpj asks for a
 minimal example, so let's consider this one:

 {{{#!hs
 module Main (main) where

 import System.IO.Error

 data Sink = Await (Maybe Char -> Sink) | Done Int

 countFrom :: Int -> Sink
 countFrom n = Await $ \mi -> case mi of
                 Nothing -> Done n
                 Just _  -> countFrom $! n + 1

 feedFrom :: Int -> Sink -> IO ()
 feedFrom _ (Done n)  = print n
 feedFrom 0 (Await f) = feedFrom 0       (f $ Nothing)
 feedFrom n (Await f) = feedFrom (n - 1) (f $ Just 'A')

 retry :: IO a -> IO a
 retry io = catchIOError io (\_ -> retry io)

 main :: IO ()
 main = retry $ feedFrom 1000000 (countFrom 0)
 }}}

 A `Sink` (a special kind of "conduit") is some kind of automaton that
 accepts (`Await`s) a bunch of inputs (in this case `Char`s) and at some
 point terminates (`Done`). Let's recap from the blog post why this has a
 space leak:

 1. `feedFrom 1000000 (countFrom 0)` is a PAP (waiting for its `State#
 RealWorld` argument)
 2. When `retry` executes the action, it maintains a reference to that PAP
 from the exception handler.
 3. In the environment of the PAP is a thunk corresponding to `countFrom
 0`.
 4. Finally, and crucially, full laziness is turning the definition of
 `countFrom` to something more akin to

 {{{#!hs
 -- Full laziness turns countFrom into:
 countFrom :: Int -> Sink
 countFrom n = let k = countFrom $! n + 1
               in Await $ \mi -> case mi of
                    Nothing -> Done n
                    Just _  -> k
 }}}

 (The example with the original definition of `countFrom` has a space leak
 when compiled with `-O` but no space leak with `-O -fno-full-laziness`; if
 we use this version of `countFrom`, we have a space leak with or without
 full laziness enabled.)

 (1)-(4) together means that there is a reference from the PAP's
 environment to the `countFrom 0` thunk, and as `feedFrom` evaluates that
 thunk we build up a long chain

 {{{
 Await ---payload---> FUN ---environment---> Await ---payload---> ...
 }}}

 where every `Await` constructor has a function as its payload, and that
 function has a reference to the next `Await` constructor in its
 environment (closure) (section "Full laziness versus sinks" of http://www
 .well-typed.com/blog/2016/09/sharing-conduit/ has some pictures.).

 So what's the solution here? Perhaps one might argue that full laziness is
 the culprit here; it should not have floated out that continuation in
 `countFrom`. Like I said, I'm starting to have a lot of sympathy for that
 point of view; I will soon need to publish an erratum to my blog post
 because I was once again underestimating full laziness.

 BUT. We can ask a different question: do we really want to be thinking so
 hard about when and where things get allocated precisely? What if the user
 ''themselves'' wrote that alternative version of `countFrom` -- after all,
 it seems like an innocuous change. Should we really have to think so low-
 level when writing Haskell code? I would like to be able to answer "no" to
 that question.

 Here's the thing: conduits (and other structures like it) are data
 structures designed to drive computation; we _never_ expect them to be
 shared and built up in memory. (When we were discussing these matters at
 Well-Typed a comparison was drawn to data versus codata.) I think it would
 be great if we could express this, and `NOUPDATE`, I think, might allow us
 to do that.

 However, I now think annotating the constructors is not the right
 approach. In addition to Simon's probing questions, above, let's consider
 the example `countFrom`. What is the thunk that we don't want to be
 updateable? Well, `countFrom 0` really; and, if pressed for another one,
 the `Sink` in the environment of the continuation in countFrom (`countFrom
 $! n + 1`). ''Neither of those is the argument to a constructor.''. I
 think that instead we should annotate the ''type'':

 {{{#!hs
 {-# NOUPDATE Sink #-}
 data Sink = Await (Maybe Char -> Sink) | Done Int
 }}}

 Now questions such as "who created this thing? do we need spooky action at
 a distance?" are no longer relevant. It's simple and type directed. Any
 thunk of type `Sink` never gets updated.

 Some other minor bits and bobs:

 Replying to [comment:19 simonpj]:
 > I'm beginning to get glimmers of understanding about this no-update
 thing.  Consider
 > {{{
 >   t1 = [1..n]
 > vs
 >   t2 = \_ -> [1..n]
 > vs
 >   t3 = let x = [1..n] in \_ -> x
 > }}}
 > Note that
 >
 > * If we use `t1` in a shared context like `sum t1 / length t1`, we'll
 end up materialising the whole list.
 > * For `t2`, we'd get `sum (t2 ()) / length (t2 ())`, and now the list is
 computed twice rather than duplicated.  Note that `t1` and `t2` have
 different types of course.
 > * Then `t3` is the result of applying the full laziness transformation
 to `t2`, and its space behaviour is back to that of `t1`.
 >
 > Reflections:
 >
 > * I think that this "noupdate" pragma is intended to achieve an effect
 like `t2`, but more conveniently, without changing types.  Correct?

 Exactly. If we had a list type that was marked as `NOUPDATE`, then `sum t1
 / length t1` would not have a space leak (though the list would be
 evaluated twice).

 > * I think (but am not sure) that you intend to use this only for one-
 shot thunks, where (unlike the sum/count example) the thunk is evaluated
 only once.   In which case it would often be discarded after being
 evaluated, in which case where does the leak come from.  A small, concrete
 example would be jolly useful.

 No, I don't think that's necessarily the case. Marking something as
 `NOUPDATE` would imply that you're okay with it being evaluated more than
 once; indeed, that's what you want. In the minimal example I've been
 discussing in this comment, we ''want'' that conduit (sink) to be re-
 evaluated should the exception handler be run.

 > * Notice how important it is that in `t2` the lambda ''syntactically
 encloses'' the leaky computation.  Otherwise you get `t3`.

 I think this is another reason to move to a type directed approach
 instead. Syntactic enclosure is too brittle and too prone to be affected
 by the optimizer.

 Replying to [comment:20 jpbernardy]:
 > As it turns out, we currently have a proposal on the table which is
 capable of expressing where sharing should not occur, in a principled way,
 by using types.
 >
 > This page sums up how it may play out in Edsko's example.
 >
 >
 https://ghc.haskell.org/trac/ghc/wiki/LinearTypes/Examples#Controllingsharingfulllaziness
 >

 Hmmm, I had not realized the connection to linearity, and I wasn't aware
 of this work. Must take a closer look (my PhD is on uniqueness typing :).
 I'm not sure however that linearity is what we want here. Do we want to
 reject a definition such as

 {{{#!hs
 someConduit = do
     x <- await
     case x of
       True  -> do foo ; someConduit
       False -> do foo ; someConduit
 }}}

 If conduits can never be shared, this would be type incorrect. This would
 seem too restrictive. `NOUPDATE` in a way is kind of opposite to
 linearity: it's fine to share, just make sure that every time we access
 this value we recompute it.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12620#comment:22>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler