[Haskell-cafe] The IO monad is 45 years old

Wed Dec 29 10:13:13 CET 2010

The unabated debate about exactly how much category theory one needs
to know to understand that strange beast of IO prompts a thought if
monads, like related continuations, are the things that are destined
to be rediscovered time and time again.

An old (1994) paper on category theory monads and functional
programming included an interesting historical side-note. It turns out
that the RealWorld-passing trick underlying the implementation of the
IO monad, the trick that made it possible to embed truly side-effecting 
operations into pure Haskell -- the trick is 45 years old. It has been
first published in February 1965.

That 1965 paper also anticipated State and Writer monad, call/cc,
streams and delayed evaluations, relation of streams with co-routines,
and even stream fusion.

First, here is the historical aside, cited from 

   Jonathan M. D. Hill and Keith Clarke
   An Introduction to Category Theory, Category Theory Monads,
   and Their Relationship to Functional Programming. 1994
   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6497

[blockquote]
3 An historical aside

Monads are typically equated with single-threadedness, and are
therefore used as a technique for incorporating imperative features
into a purely functional language. Category theory monads have little
to do with single-threadedness; it is the sequencing imposed by
composition that ensures single-threadedness. In a Wadler-ised monad
this is a consequence of bundling the Kleisli star and flipped compose
into the bind operator. There is nothing new in this connection. Peter
Landin in his Algol 60 used functional composition to model
semi-colon. Semi-colon can be thought of as a state transforming
operator that threads the state of the machine throughout a program.
The work of Peyton-Jones and Wadler has turned full circle back to
Landin's earlier work as their use of Moggi's sequencing monad enables
real side-effects to be incorporated into monad operations such as
print. This is similar to Landin's implementation of his sharing
machine where the assignandhold function can side-effect the store of
the sharing machine because of the sequencing imposed by functional
composition. Landin defined that `Imperatives are treated as null-list
producing functions' [In Landin's paper, () is the syntactic
representation of the empty list and not the unit.]. The assignandhold
imperative is subtly different in that it enables Algol's compound
statements to be handled. The function takes a store location and a
value as its argument, and performs the assignment to the store of the
sharing machine, returning the value assigned as a result of the
function. Because Landin assumed applicative order reduction, the
K combinator was used to return (), and the imperative was evaluated
as a side effect by the unused argument of the K-combinator. Statements
are formed by wrapping such an imperative in a lambda expression that
takes () as an argument. Two consecutive Algol-60 assignments would be
encoded in the lambda calculus as:

         Algol 60     |          Lambda Calculus
         x:=  2;      |   ( (\() -> K () (assignandhold x 2)) .
         x:=  -3;     |     (\() -> K () (assignandhold x (-3))) )  ()

By using a lambda with () as its parameter, () can be thought of as
the `state of the world' that is threaded throughout a program by
functional composition.
[/blockquote]

Peter Landin's paper is remarkable indeed:

  P. J. Landin. A correspondence between ALGOL 60 and Church's lambda
  notation. 
  Communications of the ACM, 8(2):89-101, February 1965. 
  (Part 2 in CACM Vol 8(2) 1965, pages 158-165.)
  http://portal.acm.org/citation.cfm?id=363749

First the reader will notice the `where' notation. Peter Landin even
anticipated the debate on `let' vs `where', saying
``The only consideration in choosing between `let' and `where'
will be the relative convenience of writing an auxiliary
definition before or after the expression it qualifies.''

The ()-passing trick is described in full on p100, with remarkable
clarity:
``Statements. Each statement is rendered as a 0-list-
transformer, i.e. a none-adic function producing the nullist
for its result. It achieves by side-effects a transformation of
the current state of evaluation. ...
Compound statements are considered as functional products (which we
indicate informally by infixed dots).''

A remark
``However, input/output devices can be modeled as named lists, with
special, rather restricted functions associated.  ... Writing is
modeled by a procedure that, operates on a list, and appends a new
final segment derived from other variables. (Alternatively, a purely
functional approach can be contrived by including the transformed list
among the results.)''

anticipated the State and Writer monads as well.

Another remark, in the section on Streams,
``This correspondence [laws of head/tail/cons] serves two related
purposes. It enables us to perform operations on lists (such as
generating them, mapping them, concatenating them) without using an
`extensive,' item-by-item representation of the intermediately
resulting lists; and it enables us to postpone the evaluation of the
expressions specifying the items of a list until they arc actually
needed. The second of these is what interests us here.'' 

and footnote 6, 
``It appears that in stream-transformers we have a functional analogue
of what Conway [12] calls "co-routines."

show that Peter Landin understood streams, on-demand evaluation and
even stream fusion.