[Haskell-beginners] Recursive Let

Tue Feb 10 10:57:56 EST 2009

On Mon, Feb 09, 2009 at 05:43:03PM -0800, Tom Poliquin wrote:
> 
> 
> 1) How does recursion happen?
> 
> Being an ex-Schemer I recalled that let x = 3
> get translated into something like
> 
>    (lambda (x) ... ) 3
> 
> So now there's a function involved. It's clear that b:d is
> 
>    (cons b d) 
> 
> another function. So with appropriate plumbing
> I could see the thing recurses but the Haskell viewpoint
> eludes me.

In Haskell, thinking of let expressions as being translated into
lambda expressions is not very helpful, precisely because of the
special ability to have recursive let bindings.  In Haskell it is
perfectly OK to have something like

  let x = 1:y
      y = 0:x
  in  ...

but it is not clear how this would get translated into function
application (which would come first?).  It's better just to think of
let as a special, primitive construct.

> 
> 2) How do things get started?
> 
> It seems that Haskell could figure out that
> 'd' is a list. If an uninitialized (undefined?) 
> list contains [] then things are reasonably
> straightforward ..

An "uninitialized" list does not contain [].  I think perhaps your use
of the terms "uninitialized/undefined" might betray an imperative
mindset (?); in any case these are not the right words to use, since
they give an incorrect picture of what is going on.

Here's what actually happens: as a first example, let's consider the
let-binding

  let x = 1:x
  in ...

What happens here is that x is bound to a closure or 'thunk'
(suspended computation) which, *when needed*, will evaluate to 1:x.
This is laziness at work.  x is not 'uninitialized'; it is perfectly
well initialized to this thunk.  It is definitely not equal to []; at
this point the runtime doesn't know anything about x other than the
fact that it is a list, and that there is a thunk which may be
evaluated if and when the value of x is actually needed.  If you don't
use x anywhere in the ..., nothing will ever happen and the thunk will
eventually get garbage collected (in fact, it's likely that the
compiler would optimize x away and it would never be in memory at
all).

But suppose the value of x is needed (that is, some code
pattern-matches on x). Then the thunk will be evaluated just far
enough to allow the pattern-match: in particular, x will now be a list
with first element 1, and remainder of the list... another thunk.  At
this point x is not [1]; it is [1:...] where the ... represents
another thunk which will evaluate to x.  This process repeats as more
elements of x are needed, with each thunk evaluating just far enough
to allow the necessary pattern-matches, and suspending the remainder
of computation in another thunk.

Now, let's look at your example:

  let (c,d) = (d,b:d)

Of course, c and d will both be initialized to thunks which will
evaluate to lists as needed.  It's important to realize that from a
semantic point of view, b, c, and d never change.  They always
represent the same, unchanging values, as defined by this let
statement; it's just that operationally, only part of those values may
be actually evaluated, as needed.

If b has the value 7, then d = 7:d, which means d = 7:7:d, which means
d = 7:7:7:d, and so on, and the runtime will expand d out as far as
needed (in this case, 10 elements). You really can think of d as
*being* an infinite list of 7's, only part of which is evaluated.

> This is all fine. The only thing that subconsciously
> nags me is that we appear to 'return' each time  which 
> would imply that we would leave the closure causing 'd' 
> to be undefined again.

One way to think about it is that we only 'return' once: when
tracePlus 7 is evaluated, it returns, once and for all, a value 'c',
the evaluation of which happens to be suspended; it will be evaluated
further as needed.  Of course, if you want, you can indeed think of
execution jumping back and forth between main and tracePlus as c is
evaluated bit by bit, but I think this view is unhelpful, as it
suggests procedure calls in an imperative language, where local
variables such as 'd' would indeed go out of scope each time tracePlus
'exited'.  Instead, think of tracePlus as returning a closure, which
packages up the values of b, c, and d; it is this closure which is
then evaluated incrementally as execution continues.

> Back to the undefined list ... if it's *not* an
> [] and has something to do with bottom ( _|_ )
> then my eyes glaze over and I have to go watch a
> Woody Allen movie to reset.

A truly _undefined_ list is indeed _|_, and not []. [] is quite
defined; it is the list with no elements.  _|_ just means 'the
completely undefined value'.  However, as I hope I have made clear
above, the c and d in your example are neither [] nor undefined, so
this is immaterial.

There is much more to say on the topic, and doubtless there are places
where I've told some small fibs because the truth is more
complicated.  But hopefully this helps provide a useful way to
understand things.

-Brent