[Haskell-cafe] I love purity, but it's killing me.

Tom Hawkins tomahawkins at gmail.com
Fri Feb 8 01:33:35 EST 2008


I've been programming with Haskell for a few years and love it.  One
of my favorite applications of Haskell is using for domain specific
languages.  However, after designing a handful of DSLs, I continue to
hit what appears to be a fundamental hurdle -- or at least I have yet
to find an adequate solution.

My DSLs invariably define a datatype to capture expressions; something
like this:

data Expression
  = Add Expression Expression
  | Sub Expression Expression
  | Variable String
  | Constant Int
  deriving Eq

Using the datatype Expression, it is easy to mass a collections of
functions to help assemble complex expressions, which leads to very
concise programs in the DSL.

The problem comes when I want to generate efficient code from an
Expression (ie. to C or some other target language).  The method I use
invovles converting the tree of subexpressions into an acyclic graphic
to eliminate common subexpressions.  The nodes are then topologically
ordered and assigned an instruction, or statement for each node.  For
example:

let a = Add (Constant 10) (Variable "i1")
    b = Sub (Variable "i2") (Constant 2)
    c = Add a b

would compile to a C program that may look like this:

  a = 10 + i1;
  b = i2 - 2;
  c = a + b;

The process of converting an expression tree to a graph uses either Eq
or Ord (either derived or a custom instance) to search and build a set
of unique nodes to be ordered for execution.  In this case "a", then
"b", then "c".  The problem is expressions often have shared,
equivalent subnodes, which dramatically grows the size of the tree.
For example:

let d = Add c c
    e = Add d d    -- "e" now as 16 leaf nodes.

As these trees grow in size, the equality comparison in graph
construction quickly becomes the bottleneck for DSL compilation.
What's worse, the phase transition from tractable to intractable is
very sharp.  In one of my DSL programs, I made a seemingly small
change, and compilation time went from milliseconds to
not-in-a-million-years.

Prior to Haskell, I wrote a few DSLs in OCaml.  I didn't have this
problem in OCaml because each "let" expression was mutable, and I
could use the physical equality operator to perform fast comparisons.
Unfortunately, I have grown to love Haskell's type system and its lack
of side effects, and could never go back.

Is there anything that can be done to dramatically speed up
comparisons, or is there a better approach I can take to extract
common subexpressions?  I should point out I have an opportunity to
get Haskell on a real industrial application.  But if I can't solve
this problem, I may have to resort to far less eloquent languages.
:-(

Thanks for any and all help.

-Tom


More information about the Haskell-Cafe mailing list