Jonathan Cast jcast at ou.edu
Mon Jul 2 17:48:21 EDT 2007

```On Monday 02 July 2007, Andrew Coppin wrote:
> What were monads like before they became a Haskell language construct?
>
> Is Haskell's idea of a "monad" actually anywhere close to the original
> mathematical formalism?
>
> Just being randomly curiose...

Curiosity can be a dangerous thing . . .

into Haskell.  If you really want to know the details (it's a fascinating
subject once you start understanding it), go read a couple or three good
books on category theory; I'm sure someone else here can give you a better
reading list than I can off the top of my head.  Building up to the idea of a
monad in category theory, with proper examples and whatnot, requires a full
textbook; in particular, /Toposes, Triples, and Theories/, where I learned
the concept and which wants to actually do something interesting with them,
is required to compress the categorical preliminaries to a point where I
needed constant reference to other books just to understand the basic
concepts.  Nevertheless, if you really want the high points:

Monads come from a branch of abstract mathematics called /category theory/,
which was invented, originally, to provide a mathematical characterization of
polymorphic functions (or, rather, of the uniform nature of certain
functions, which FPers would characterize as a form of polymorphism).  A
category is a 6-tuple

(objects, arrows, domain, codomain, id, compose)

where objects and arrrows are arbitrary classes, domain and codomain are
mappings from arrows to objects, id is a mapping from objects to arrows such
that

domain (id alpha) = alpha
codomain (id alpha) = alpha

and compose is an associative partial binary operator on arrows with the
images of id as units, such that compose f g is defined whenever domain f =
codomain g, and

domain (compose f g) = domain g
codomain (compose f g) = codomain f

Following standard practice, I will write

f : alpha -> beta

for

domain f = alpha
codomain f = beta

and f . g for compose f g.  (Standard practice in category theory is I believe
to write composition as juxtaposition (like application in Haskell, but
associative), but I'm not going that far to make the notation look more

Given that A and B are categories, a /functor/ F from A to B is a pair of
mappings (F_objects, F_arrows) (both generally written F in practice) from
objects of A to objects of B and arrows of A to arrows of B, respectively,
such that

domain (F a) = F (domain a)
codomain (F a) = F (codomain a)
compose (F a) (F b) = F (compose a b)
id (F a) = F (id a)

Given two functors, F, G : A -> B, a natural transformation h : F -> G is a
mapping h from objects of A to arrows of B such that

h alpha : F alpha -> G alpha
h beta . F f = F f . h alpha (forall f : alpha -> beta)

Given a category C, a monad in C is a triple (F, eta, mu) of a functor F :
C -> C, a natural transformation eta : Id -> F, where Id is the identity
functor, and a natural transformation mu : F . F -> F, where composition of
functors is defined component-wise, satisfying the additional laws

mu . F mu = mu . mu
mu . eta = id = mu . F eta

When translating category theory into Haskell, objects are taken to be types
and arrows functions; domain, codomain, id, and compose are defined the way
you would expect.  A functor is represented by class Functor in the standard
prelude; an instance consists of a data type constructor F, which is the only
mapping on types supported by Haskell, and a function

fmap :: (alpha -> beta) -> (F alpha -> F beta)

(remember currying!) corresponding to the arrow map.  A natural transformation
is a polymorphic function f :: F alpha -> G alpha such that

fmap g . f = f . fmap g

for all functions g.  This is (at least sometimes) provable for arbitrary
polymorphic functions (as long as they don't use seq!), but I'm not sure of
the details of this.  At any rate, doing so isn't certainly isn't necessary.

would give us

class Functor m => Monad m where
return :: alpha -> m alpha
join :: m (m alpha) -> m alpha

(join, by the way, is one of the most under-appreciated of Haskell library
functions; learning it is necessary both for true mastery of Haskell monads).
The complete collection of class laws (including the natural transformation
laws) is

fmap g . return = return . g
fmap g . join = join . fmap (fmap g)
join . fmap join = join . join
join . return = id = join . fmap return

(Note the essential similarity of these laws to those given above).

Haskell, of course, actually gives us

return :: alpha -> m alpha
(>>=) :: m alpha -> (alpha -> m beta) -> m beta

The relationship between these two signatures is given by the set of equations

fmap f a = a >>= return . f
join a = a >>= id
a >>= f = join (fmap f a)

return x >>= f = f x
a >>= return = a
(a >>= f) >>= g = a >>= \ x -> f x >>= g

We can take the relationship given above as definitional, in either direction,
and derive the appropriate set of laws.  Taking fmap and join as primitive,
we get

return x >>= f
= join (fmap f (return x))
= join (return (f x))
= f x

a >>= return
= join (fmap return a)
= a

(a >>= f) >>= g
= join (fmap g (join (fmap f a)))
= join (join (fmap (fmap g) (fmap f a)))
= join (fmap join (fmap (fmap g) (fmap f a)))
= join (fmap (join . fmap g . f) a)
= a >>= join . fmap g . f
= a >>= \ x -> join (fmap g (f x))
= a >>= \ x -> f x >>= g

Taking (>>=) as primitive, we get

fmap f (return x)
= return x >>= return . f
= return (f x)

fmap f (join a)
= (a >>= id) >>= return . f
= a >>= \ x -> id x >>= return . f
= a >>= \ x -> x >>= return . f
= a >>= fmap f
= a >>= \ x -> id (fmap f x)
= a >>= \ x -> return (fmap f x) >>= id
= (a >>= return . fmap f) >>= id
= join (fmap (fmap f) a)

join (join a)
= (a >>= id) >>= id
= a >>= \ x -> x >>= id
= a >>= \ x -> join x
= a >>= \ x -> return (join x) >>= id
= (a >>= return . join) >>= id
= join (fmap join a)

join (return a)
= return a >>= id
= id a
= a

join (fmap return a)
= (a >>= return . return) >>= id
= a >>= \ x -> return (return x) >>= id
= a >>= \ x -> return x
= a >>= return
= a

This is less general than the category theory version, and seq creates
problems with some of these equations (or at least with proving that any of
them hold for real programs!), but using the normal methods of Haskell
equational reasoning (assume all values are total and finite, all functions
preserve totality and finiteness, and the context preserves totality and
finiteness, and using = on functions to mean equality on total and finite