[Haskell-cafe] Haskell performance (again)!

Sun Oct 8 12:58:57 EDT 2006

This email actually turned out much longer than I expected, but I hope
it sparks interesting (and hopefully, thorough!) discussion on points
that weren't touched on by previous threads on this topic. What
follows describes my journey thus far exploring what I see (as a
newcomer to Haskell) as a major problem with Haskell - the ease with
which users can write inefficient code, and have no idea that they
did. Some of this has to do with laziness, some of it with the
functional programming in general. My goal is to achieve (as a link
further down says) *good* performance, not blazing performance. I.e.,
I want to avoid what should really be no-brainers such as
little-omega(1) space utilization for simple folds.

The first version of a simple program I wrote was reasonably "clean."
(Throughout this email, "clean" is supposed to mean some combination
of: clear, compact, modular, elegant, etc.) It's a polynomial adder,
which takes in lists of (coefficient, degree) tuples, combining terms
of the same degree and maintaining a sorted order of least-to-greatest
degree:

type Poly = [(Int,Int)]

addPoly1 :: Poly -> Poly -> Poly
addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t)
    | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t
    | p1d < p2d = p1h : addPoly1 p1t p2
    | p1d > p2d = p2h : addPoly1 p1 p2t
addPoly1 p1 [] = p1
addPoly1 [] p2 = p2
addPoly1 [] [] = []

But this doesn't use tail recursion/accumulation (which seems to me
like a complete hack that is a mandatory price to pay for using FP),
so I rewrote it:

addPoly :: Poly -> Poly -> Poly
addPoly p1 p2 =
    let addPoly' p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) result
            | p1d == p2d = addPoly' p1t p2t ((p1c + p2c, p1d):result)
            | p1d > p2d = addPoly' p1 p2t (p2h:result)
            | p1d < p2d = addPoly' p1t p2 (p1h:result)
        addPoly' (p1:p1s) [] result = addPoly' p1s [] (p1:result)
        addPoly' [] (p2:p2s) result = addPoly' [] p2s (p2:result)
        addPoly' [] [] result = reverse result
    in addPoly' p1 p2 []

But laziness will cause this to occupy Theta(n)-space of cons-ing
thunks. (See appendix for a simpler example.) My third iteration will
become even uglier because I will have to incorporate strictness and
things like $! or seq. And there's probably a whole host of other
issues that I haven't even thought of or don't know about (boxing,
space leaks, etc.).

>From #haskell, I got a general piece of advice:

Oct 07 22:37:20 <Cale>	Actually, a lot of the time, code gets messy
when people optimise because they take the wrong road to optimisation
Oct 07 22:37:31 <Cale>	There are two ways to optimise a piece of Haskell code
Oct 07 22:37:43 <Cale>	You can make it stricter, forcing evaluation to
occur sooner
Oct 07 22:38:01 <Cale>	Or you can make it lazier, ensuring that things
are not demanded until actually needed
Oct 07 22:38:09 <Korollary>	@wiki performance
Oct 07 22:38:09 <lambdabot>	http://www.haskell.org/haskellwiki/performance
Oct 07 22:38:13 <Cale>	the latter route actually tends to result in cleaner code

Of course, to want to learn route 2 was a no-brainer. I read through
that wiki page on laziness and various other resources, but was
disappointed in what I found:

http://www.haskell.org/haskellwiki/Performance/Laziness only discusses
the aspect of laziness where you only evaluate what you need, which by
now has been repeatedly beaten into my head and sounds obvious. Aren't
there other advantages to laziness? Or at least, am I not fully
appreciating the "obvious" work-avoiding part of laziness? For
instance, the example seems to allude to sharing (between the even and
odd functions), which I think ties in strongly with laziness. I was
hoping for more in-depth insights on how to take advantage of laziness
to write cleaner AND more efficient code. (A loose analogy exists in
computer architecture - the smaller the chip, the less power it can
consume/the faster it can clock.)

http://en.wikibooks.org/wiki/Haskell/Laziness_revisited does slightly
better but still just scratches the surface - it gives an example of a
clean (compact and modular) yet efficient piece of code
(isSubstringOf), then completely neglects explaining it (e.g., exactly
why/how does it run more efficiently?).

http://users.aber.ac.uk/afc/stricthaskell.html seems to suggest that
strictness is the way to go. Please say it ain't so!

http://www.algorithm.com.au/mt/haskell/haskells_performance.html says
that between code optimized for performance in Haskell and in another
language (like O'Caml), "which is more declarative, high-level, and
most importantly, which one doesn't require an expert in the language
to write" is an unfortunate (for Haskell fans) no-brainer:

"The problem then is you have to know why something is slow, and
there's where Haskell can get complex. Is there a space leak? (Do you
even know what a space leak is?) Is your array strict instead of lazy?
If it's strict, is it boxed instead of unboxed? (Do you even know what
unboxing is?) Can you use unsafeRead instead of readArray? (Did you
know that unsafeRead exists? I sure didn't!)"

So, how do I take advantage of laziness to write clean *and*
performant code? I've been seeking resources with concrete examples,
explanations, or guides on how to "systematically" do this. Is this
even possible? Or am I deluding myself, and strictness is the way to
go? If so, has there been any work/research to address the issues
pointed out in the last blog entry? And do most (experienced) Haskell
users sacrifice cleanliness for speed, or speed for cleanliness?

I feel that an understanding of how Haskell compilers work would help
immensely in writing more performant Haskell code (or just better
Haskell code in general), as well as understand what optimizations are
reasonable to expect from the compiler. (I have a university class of
background on the design/implementation of simple
lexers/parsers/compilers/interpreters/code block graph analyses for
imperative OO languages. However, i'd love to learn more about these
things for a language like Haskell, including details on lazy
evaluation, pattern-matching, overloading (why it's slow), currying,
PFA, type classes, monads, etc. Preferrably something "lighter weight"
than a bunch of research papers. Any pointers?)

BTW, please also feel free to give me feedback/refinements on my code
snippets. :)

Thanks a ton for any enlightenment!

Yang

Aside: http://www.matthewmorgan.net/blog/archives/2004/10/07/haskell-performance.
This post brings up a most reasonable point (venturing off onto
tangent now) - space leaks are a much more serious concern than
performance, and they're a difficult problem even for experienced
Haskell users (http://www.haskell.org/pipermail/haskell-cafe/2003-December/005587.html).