[Haskell-cafe] I love purity, but it's killing me.

Tue May 26 19:49:12 EDT 2009

Hi Tom,

I've been working on another code-generating graphics compiler, generating
GPU code.  As always, I run into the problem of efficient common
subexpression elimination.  In Pan, Vertigo & Pajama, I used lazy
memoization, using stable pointers and weak references, to avoid the
worst-case-exponential behavior you mention below.  I'm now using a
bottom-up CSE method that's slower and more complicated than I'm going for.

What's your latest wisdom about CSE in DSELs?

Thanks,  - Conal

On Thu, Feb 7, 2008 at 11:33 PM, Tom Hawkins <tomahawkins at gmail.com> wrote:

> I've been programming with Haskell for a few years and love it.  One
> of my favorite applications of Haskell is using for domain specific
> languages.  However, after designing a handful of DSLs, I continue to
> hit what appears to be a fundamental hurdle -- or at least I have yet
> to find an adequate solution.
>
> My DSLs invariably define a datatype to capture expressions; something
> like this:
>
> data Expression
>  = Add Expression Expression
>  | Sub Expression Expression
>  | Variable String
>  | Constant Int
>  deriving Eq
>
> Using the datatype Expression, it is easy to mass a collections of
> functions to help assemble complex expressions, which leads to very
> concise programs in the DSL.
>
> The problem comes when I want to generate efficient code from an
> Expression (ie. to C or some other target language).  The method I use
> invovles converting the tree of subexpressions into an acyclic graphic
> to eliminate common subexpressions.  The nodes are then topologically
> ordered and assigned an instruction, or statement for each node.  For
> example:
>
> let a = Add (Constant 10) (Variable "i1")
>    b = Sub (Variable "i2") (Constant 2)
>    c = Add a b
>
> would compile to a C program that may look like this:
>
>  a = 10 + i1;
>  b = i2 - 2;
>  c = a + b;
>
> The process of converting an expression tree to a graph uses either Eq
> or Ord (either derived or a custom instance) to search and build a set
> of unique nodes to be ordered for execution.  In this case "a", then
> "b", then "c".  The problem is expressions often have shared,
> equivalent subnodes, which dramatically grows the size of the tree.
> For example:
>
> let d = Add c c
>    e = Add d d    -- "e" now as 16 leaf nodes.
>
> As these trees grow in size, the equality comparison in graph
> construction quickly becomes the bottleneck for DSL compilation.
> What's worse, the phase transition from tractable to intractable is
> very sharp.  In one of my DSL programs, I made a seemingly small
> change, and compilation time went from milliseconds to
> not-in-a-million-years.
>
> Prior to Haskell, I wrote a few DSLs in OCaml.  I didn't have this
> problem in OCaml because each "let" expression was mutable, and I
> could use the physical equality operator to perform fast comparisons.
> Unfortunately, I have grown to love Haskell's type system and its lack
> of side effects, and could never go back.
>
> Is there anything that can be done to dramatically speed up
> comparisons, or is there a better approach I can take to extract
> common subexpressions?  I should point out I have an opportunity to
> get Haskell on a real industrial application.  But if I can't solve
> this problem, I may have to resort to far less eloquent languages.
> :-(
>
> Thanks for any and all help.
>
> -Tom
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090526/410cd57c/attachment.html