[Haskell-cafe] Re: evaluation semantics of bind

Thu Feb 5 20:19:53 EST 2009

Gregg Reynolds wrote:
> Right, but that's only because the compiler either somehow knows about
> side effects or there is some other mechanism - e.g. an implicit World
> token that gets passed around - that prevents optiimization.  As far
> as the formal semantics of the language are concerned, there's no
> essential difference between  "getChar >>= \x -> getChar" and " Foo 3
>>> = \x -> Foo 7 " for some data constructor Foo.  I should think the
> latter would be optimized; if so, why not the former?  The presence of
> some hidden (from me, anyway) semantics is inescapable.

There's no reason to assume the latter would be "optimized" either. 
Consider:

     > data NotIO a = NotIO String a
     >
     > instance Monad NotIO where
     >     return x          = NotIO "" x
     >     (NotIO s x) >>= f = (\(NotIO z y) -> NotIO (s++z) y) (f x)
     >
     > notGetChar = NotIO "foo" 'a'

Let's consider your example:

     notGetChar >>= \x -> notGetChar
   ==
     (NotIO "foo" 'a') >>= (\x -> NotIO "foo" 'a')
   ==
     (\(NotIO z y) -> NotIO ("foo"++z) y) ((\x -> NotIO "foo" 'a') 'a')
   ==
     (\(NotIO z y) -> NotIO ("foo"++z) y) (NotIO "foo" 'a')
   ==
     NotIO ("foo"++"foo") 'a'

It's clear to see that this result is not equal to notGetChar. And the 
compiler needn't know anything special about NotIO to realize that. 
There's nothing special about IO. The only difference between NotIO and 
IO is that NotIO has a Chars for "side effects". IO is a perfectly fine 
mathematical object, the only difference is that it is appending 
sequences of an abstract type that we mere mortals like to think of as 
"actions" or "side effects" or "a program". The mathematics neither 
knows nor cares about any of that rubbish.

The only relevant thing[1] is that this abstract type ---whatever it is, 
however it's implemented--- forms a monoid. The return function 
associates the identity of the monoid with some value (which is also 
irrelevant as far as the monad is concerned), and the bind operation 
uses the associative operation of the monoid to combine them. The latter 
is more obvious when it's phrased in terms of join :: Monad m => m(m a) 
-> m a, and the monad law: join . join == join . fmap join

It so happens that a sequence of anything is the "free" monoid. 
Appending sequences of characters is no different than appending 
sequences of actions or program instructions. For lists or sets as 
monads, the monoid is a path through a decision tree aka a sequence of 
choices. These "choices" aren't stored in bits anywhere, just like IO's 
RealWorld[2], but people don't seem to get as flustered trying to figure 
out what exactly a "choice" is and what a "choice" means as a 
mathematical object. The point is, it doesn't matter. The choice, the 
side effect, the character, these are all just arbitrary sets of things. 
We can construct a free monoid over any set, and the monoid neither 
knows nor cares anything about the "values" in that set.

The only way we can optimize away one of the calls to notGetChar (or any 
other monadic function) is if we can guarantee that no other computation 
will use either the monoidal values or the "contained" values associated 
with it. IO is defined with a bind operator that's strict in the 
monoidal value, which means we can never remove things (unless, perhaps, 
we can demonstrate that they only append the identity of the monoid, in 
which case evaluating the function has no side effects--- by 
definition). There's nothing particularly intriguing about a strict bind 
operator, State has one as do other monads. And there's nothing 
particularly intriguing about "stateful" monads either, there is the 
Lazy State monad too (and the ACIO monad is something like a lazy IO).

[1] Okay, not the *only* relevant thing :)

[2] This is just like the binary search algorithm. Effectively there's a 
tree that spans the set we're searching, and each iteration of search is 
walking down one ply of that tree. Only for binary search we usually 
don't bother constructing that tree, the tree is realized in the 
recursion pattern of the algorithm. Or dually, binary tree 
datastructures are a reification of the algorithm. You get this same 
duality with, frex, lazy lists vs for-loops. There's nothing special 
about being stored in bits vs being stored in invariants or actions.

-- 
Live well,
~wren