[Haskell-cafe] On the purity of Haskell

Wed Dec 28 18:39:52 CET 2011

This is just my view on whether Haskell is pure, being offered up for 
criticism. I haven't seen this view explicitly articulated anywhere 
before, but it does seem to be implicit in a lot of explanations - in 
particular the description of Monads in SBCs "Tackling the Awkward 
Squad". I'm entirely focused on the IO monad here, but aware that it's 
just one concrete case of an abstraction.

Warning - it may look like trolling at various points. Please keep going 
to the end before making a judgement.

To make the context explicit, there are two apparently conflicting 
viewpoints on Haskell...

 1. The whole point of the IO monad is to support programming with
    side-effecting actions - ie impurity.
 2. The IO monad is just a monad - a generic type (IO actions), a couple
    of operators (primarily return and bind) and some rules - within a
    pure functional language. You can't create impurity by taking a
    subset of a pure language.

My view is that both of these are correct, each from a particular point 
of view. Furthermore, by essentially the same arguments, C is also both 
an impure language and a pure one.

See what I mean about the trolling thing? I'm actually quite serious 
about this, though - and by the end I think Haskell advocates will 
generally approve.

First assertion... Haskell is a pure functional language, but only from 
the compile-time point of view. The compiler manipulates and composes IO 
actions (among other things). The final resulting IO actions are finally 
swallowed by unsafePerformIO or returned from main. However, Haskell is 
an impure side-effecting language from the run-time point of view - when 
the composed actions are executed. Impurity doesn't magically spring 
from the ether - it results from the translation by the compiler of IO 
actions to executable code and the execution of that code.

In this sense, IO actions are directly equivalent to the AST nodes in a 
C compiler. A C compiler can be written in a purely functional way - in 
principle it's just a pure function that accepts a string (source code) 
and returns another string (executable code). I'm fudging issues like 
separate compilation and #include, but all of these can be resolved in 
principle in a pure functional way. Everything a C compiler does at 
compile time is therefore, in principle, purely functional.

In fact, in the implementation of Haskell compilers, IO actions almost 
certainly *are* ASTs. Obviously there's some interesting aspects to that 
such as all the partially evaluated and unevaluated functions. But even 
a partially evaluated function has a representation within a compiler 
that can be considered an AST node, and even AST nodes within a C 
compiler may represent partially evaluated functions.

Even the return and bind operators are there within the C compiler in a 
sense, similar to the do notation in Haskell. Values are converted into 
actions. Actions are sequenced. Though the more primitive form isn't 
directly available to the programmer, it could easily be explicitly 
present within the compiler.

What about variables? What about referential transparency?

Well, to a compiler writer (and equally for this argument) an identifier 
is not the same thing as the variable it references.

One way to model the situation is that for every function in a C 
program, all explicit parameters are implicitly within the IO monad. 
There is one implicit parameter too - a kind of IORef to the whole 
system memory. Identifiers have values which identify where the variable 
is within the big implicit IORef. So all the manipulation of identifiers 
and their reference-like values is purely functional. Actual handling of 
variables stored within the big implicit IORef is deferred until run-time.

So once you accept that there's an implicit big IORef parameter to every 
function, by the usual definition of referential transparency, C is as 
transparent as Haskell. The compile-time result of each function is 
completely determined by its (implicit and explicit) parameters - it's 
just that that result is typically a way to look up the run-time result 
within the big IORef later.

What's different about Haskell relative to C therefore...

 1. The style of the "AST" is different. It still amounts to the same
    thing in this argument, but the fact that most AST nodes are simply
    partially-evaluated functions has significant practical
    consequences, especially with laziness mixed in too. There's a deep
    connection between the compile-time and run-time models (contrast
    C++ templates).
 2. The IO monad is explicit in Haskell - side-effects are only
    permitted (even at run-time) where the programmer has explicitly
    opted to allow them.
 3. IORefs are explicit in Haskell - instead of always having one you
    can have none, one or many. This is relevant to an alternative
    definition of referential transparency. Politicians aren't
    considered transparent when they bury the relevant in a mass of the
    irrelevant, and even pure functions can be considered to lack
    transparency in that sense. Haskell allows (and encourages) you to
    focus in on the relevant - to reference an IORef Bool or an IORef
    Int rather than dealing with an IORef Everything.

That last sentence of the third point is my most recent eureka - not so 
long ago I posted a "Haskell is just using misleading definitions - it's 
no more transparent than C" rant, possibly on Stack Overflow. Wrong 
again :-(

So - what do you think?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20111228/18574617/attachment.htm>