[Haskell-cafe] Re: State monad strictness - how?

Thu Jan 11 11:19:52 EST 2007

>> Unfortunately, the current situation is that State is only
>> available as a lazy monad, and StateT is only available
>> as a strict monad.
>
> I agree with you that both lazy and strict monads are important and
> that we should have both options in a monad library.
> 
> But the fun doesn't end there. There are other strictness properties
> to consider. Take the state monad for example. Should it be strict or
> lazy in the state that it carries around? What about the value
> component? I think the answer to these questions are the same as for
> monadic strictness above: both strict and lazy variants are useful. 

Sorry guys, but it looks like there are monstrous misconceptions of
strictness: there is no such thing as a "strict" or a "lazy" monad.
There are only functions strict in arguments and thus one can only ask
in which arguments (>>=) is strict or how the strictness properties of
the result (x >>= y) are obtained from those of x and y. As it turns
out, the details are subtle.

Let's make an attempt to define "strict in the state". One might say
that this refers to a monad (SState s a) with the property that for all
(x :: State s a), the semantic function (runState x :: s -> (a,s)) is
strict, i.e.

    runState x _|_ = _|_

Thus, (x) evaluates the state to WHNF (weak head normal form) before it
returns a result. Note that this definition "strict in the state"
*cannot* be applied to an arbitrary instance (m) of the class
(MonadState) because the class does not offer any hints about the
semantic function (runState). One cannot state this property in terms of
(put),(get),(return) and (>>=) alone.

The current implementation of (Control.Monad.State.State s a) is not
"strict in the state". Here, we have for example

    modify f = get >>= put . f

    runState (modify (+1)) _|_
  = ((\s -> (s,s)) >>= (\n -> \s -> ((),n+1))) _|_
  = ((),_|_+1)
  = ((),_|_)

which returns the value () but gives an undefined state. This is also
the source of Dean's space leak (and stack overflow). Performing graph
reduction to WHNF on

  flip runState undefined . sequence_ . replicate 1000000 $ modify (+1)

yields the pair

  ((), (... (((undefined + 1) + 1) + 1) ... + 1))

Clearly, the second component needs much memory to hold the 1000000
numbers. Evaluating it to WHNF form will result in a stack
overflow even before (undefined) raises an exception. But the first
component is simply ().

So far, so good. An implementation of "strict in the state" would be

    newtype SState s a = S { runSState :: s -> (a,s) }

    instance Monad (SState s) where
        return a = S $ \s -> s `seq` (a, s)
        m >>= k  = S $ \s -> let
            (a, s') = runSState m s
            in runSState (k a) s'

    instance MonadState s (SState s) where
        get = \s -> s `seq` (s,s)
        put s = \s' -> s' `seq` ((),s)

We assume that the constructor (S) is hidden to the user so that he can
only build monadic actions from (return), (>>=), (get) and (put). Note
that the (>>=) operation does not mention strictness at all. Given our
assumption, we know by induction that (m) and (k a) are "strict in the
state" and it follows that (>>=) must be "strict in the state", too. A
quick check confirms

   runSState (modify (+1)) _|_ = _|_

Marvelous, let's try

 >flip runSState undefined . sequence_ . replicate 1000000 $ modify (+1)
 *** Exception: stack overflow

What happened? It's a very good exercise to perform graph reduction by
hand on this expression to see what's going on. For simplicity, you may
want to get your hands dirty on

    modify (+1) >> modify (+1) >> modify (+1) >> return ()

The result is in essence a concatenation of strict functions like in

    f = (+1) . (+1) . (+1) . (+1) ... (+1)

While the entire function (f) is strict as well, the intermediate
results are not evaluated eagerly enough, resulting in a stack overflow.

A concatenation of strict functions is amenable to strictness analysis.
So, I strongly suspect that Dean's

   tick' = get >>= (put $!) . (+1)

will yield a favorable result with optimization turned on "-O". But a
priori, the problem must be remedied by changing the definition of (>>=)
to force the intermediate accumulating parameter, like in

    m >>= k  = S $ \s -> s `seq` let
         (a, s') = runSState m s
         in runSState (k a) $! s'

The point is that the previous definition of (>>=) could also be used to
combine monadic action not "strict in the state" and must perform lazy
evaluation to achieve the non-strict semantics in that case, too. In
contrast, this definition of (m >>= k) will always return a monadic
action "strict in the state", regardless of the strictness of (m) and (k).

Accidentally, the definition of (>>=) for (StateT) forces the pair
constructor:

    instance (Monad m) => Monad (StateT s m) where
        return a = StateT $ \s -> return (a, s)
        m >>= k  = StateT $ \s -> do
                (a, s') <- runStateT m s
                runStateT (k a) s'

In case (m) and (k) are "strict in the state", this will force the
intermediate state (s) and Dean's (tick' :: StateT Int Identity ()) will
actually work without a stack overflow. But this does *not* mean that
(StateT s m a) is "strict in the state", it's more like making (uncurry
k) strict in its argument. For instance,

    runIdentity $ runStateT (let x = put 0 in x) undefined
   = ((),0)

But the strict match on (,) prevents infinite monadic actions like

    runIdentity $ runStateT (let x = x >> put 0 in x) 0

I consider this a bug in the definition of (StateT), the line should read

    ~(a, s') <- runStateT m s

But simply saying that (StateT) is "strict" and that (State) is "lazy"
does not represent the real situation.

> There is no such distinction in monadLib.  The state transformer
> inherits its behavior from the underlying monad. For example: StateT
> Int IO is strict, but StatT Int Id is lazy.   One way to get a strict
> state monad with monadLib is like this:
> 
> import MonadLib
> 
> data Lift a = Lift { runLift :: a }
> 
> instance Monad Lift where
>  return x      = Lift x
>  Lift x >>= f  = f x
> 
> strict = runLift $ runStateT 2 $ undefined >> return 5
> lazy   = runId   $ runStateT 2 $ undefined >> return 5
> 
> The difference between those two is that "strict == undefined", while
> "lazy = (5,undefined)".

This is a brilliant example of ill-defined notions of "strict" and
"lazy". It only assures that (>>=) is strict in its first argument which
can easily be circumvented by supplying (Lift undefined). It has nothing
to do with "strict in the state" and behaves just like (StateT s Id a):

   > runLift $ runStateT (let x = put 0 in x) undefined
   ((),0)
   > runLift $ flip runStateT undefined $
         sequence_ . replicate 1000000 $ modify (+1)
   ((),*** Exception: stack overflow

In the end, there is still the problem of making these multiple forms of
"strictness" available to the programmer. While I think that these
things are best left to strictness analysis, it would be no good if the
programmer loses control over these things. Josef's idea of
parametrization on the pair type as in

   newtype StateP p s a = StateP { runStateP :: (s -> p a s) }

is feasible. We have the choice between

   data PairLL a b = PairLL a b
   data PairSL a b = PairSL !a b   -- and symmetrically PairLS of course
   data PairSS a b = PairSS !a !b

which correspond to the following domains

    (x,y)             (x,y)       (x,y)
    /   \               |           |
   /     \              |           |
(_|_,y) (x,_|_)       (x,_|_)       |
   \     /              |           |
    \   /               |           |
  (_|_,_|_)             |           |
      |                 |           |

     _|_               _|_         _|_

But I think that one pair type is missing, namely the "unboxed pair"
with the domain structure

    (x,y)
    /   \
   /     \
(_|_,y) (x,_|_)
   \     /
    \   /
 (_|_,_|_) = _|_

and a pseudo-definition

   newtype UPair a b = UPair a b

Given these pair types, all of the above variations of strictness
(besides the harmless (Lift)) can be obtained via a case-expression in
the definition of (>>=):

    instance Monad (StateP p s) where
        return a = S $ \s -> p a s
        m >>= k  = S $ \s -> case (runState m s) of
            p a s' -> runSState (k a) s'

With the unboxed pair, the case turns into a let expression (i.e.
irrefutable pattern). Incidentally, the (curry) and (uncurry) functions
for unboxed pairs are real isomorphisms of types and I think that it
offers much more natural semantics than the standard lifted pair.

Taking this further, I also think that strictness is best articulated in
the place where it naturally belongs to: the type of a function. Types
are propositions, functions are proofs and a function proves the
propositions about its own strictness properties. The (almost) best
solution would be if one could simply say

  type StateSL a = State !Int a
  type StateSS a = State !Int !a

and get the corresponding behavior. Here, (!) is a type level function
that returns a corresponding unboxed type. F.i., the domain for Ints is

  ... (Up -1) (Up 0) (Up 1) ...
         \      |      /
          \     |     /
             (Up _|_)
                |

               _|_

and the domain for !Int is that of the unboxed integers Int#:

     -1  0  1
       \ | /

        _|_

(Up) does the boxing and should be the only non-strict constructor
available. I don't know whether this has to be exported to user level
Haskell, but it would be best if the core language has a type system
with strictness annotations and infers the corresponding stuff from
user-level Haskell (i.e. does strictness analysis). The difference
between lazy and strict is a difference between boxed and unboxed. I
even think that the new function "lazy" from GHC 6.6 really has type (Up
a -> a). Perhaps it is also possible to make (seq) parametric once again by

  seq :: forall a,b . !a -> b -> b

and the first argument will be forced at run-time simply because of its
type it got during type inference.

A last remark: strictness is not needed because it's good. Strictness is
needed because it exerts control over memory (and time). That's
something best left to the compiler. Of course, the compiler's job is
not to turn a O(n^2) algorithm into one that runs in O(n) time, this is
something only the programmer can do. But the compiler's job is to
figure out the `seq`s, fusions and inline definitions because I am too
lazy to mark them explicitly.

Regards,
apfelmus