[Haskell-cafe] Disadvantages of de Bruijn indicies?

Fri May 11 17:40:38 EDT 2007

Neil Mitchell wrote:

> Hi,
> 
> de Bruijn indicies look quite nice, and seem to eliminate a lot of
> complexity when dealing with free variables:
> http://en.wikipedia.org/wiki/De_Bruijn_index
> 
> So I was wondering, are they suitable for use in a compiler? If so,
> what are their disadvantages/advantages? Is there any particular
> reason that GHC (as an example) doesn't use them in its Core?
> 
> I'm trying to decide if I should use them in my compilers data type -
> and would like some recommendations before I start.
> 
> Thanks

The bigest advantage of De Bruijn indices are that:
  - alpha conversion is not needed and comparison is always modulo renaming.
  - inlining is very easy
I think the largest disadvantage (asside from readability) is that you 
lose the ability to use a Map as a symbol table for local information 
(since the names change with each level).

Another mayor disadvantage is that it is not immediatly clear how to use 
De Bruijn numbers with recursive let bindings and case branches, since a 
single node binds multiple variables. In my toy theorem prover code I 
use pairs of numbers to solve this, the first is the normal De Bruijn 
index, the second gives an index into the list of declarations.

I believe the Epigram compiler uses De Bruijn numbers for its core 
language. Have a look at "I am not a number: I am a free variable" by 
Conor McBride and James McKinna, http://www.e-pig.org/downloads/notanum.pdf

Their approach is based on a Scope datatype:
  > data Scope a = Name :. a
Which is used to 'remember' the names of bound variables. This will 
allow you to largely restore the readability. Another advantage is that 
code dealing with scoping can be localized.

One final thing I noticed is that a lot of the advantages of De Bruijn 
numbers disappear if expressions you operate on can have free variables. 
Say you want to beta reduce (a b), but b contains free variables (in the 
form of De Bruin indices), now you can no longer just substitute b in a; 
you have to increment all the indices of the free variables for each 
scope you enter in a, so these variables are not accedentially captured 
by that scope.

Twan