[Haskell-cafe] Re: Haskell performance (again)!

Sun Oct 8 15:53:01 EDT 2006

Hello,

admittedly, there is a lack of material on lazy evaluation and
performance. IMHO, the current wiki(book) and other articles are
somewhat inadequate which stems from the fact that current rumor is
"strictness is fast" and "arrays must be unboxed" or so. I don't agree
with this, so I post some remarks about performance in general and with
laziness in particular.

My first remark about performance is unfair and amounts to discuss the
issue away:
* The programmers viewpoint about performance in Haskell should be a
lazy one, i.e. you think about the performance of your code only when
its forced by someone else. If no one complains, do not even waste a
second thinking about it.

So

> type Poly = [(Int,Int)]
>
> addPoly1 :: Poly -> Poly -> Poly
> addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t)
>    | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t
>    | p1d < p2d = p1h : addPoly1 p1t p2
>    | p1d > p2d = p2h : addPoly1 p1 p2t
> addPoly1 p1 [] = p1
> addPoly1 [] p2 = p2
> addPoly1 [] [] = []

is the right thing to do. Point. With that viewpoint, your sorrows will
fade away :) Mine do.

In my experience, clean code by far outweighs any performance issues as
it allows one to reduce the complexity of the programming task until it
actually becomes tractable. As example, Darcs' patch mechanism is almost
impossible to code in a non-functional language. And what about those
open source / freeware game web sites where one reads "at some point my
code base became absolutely unmaintainable and I had to abandon this
project" ?

Linked to that is the advent of scripting languages like perl, python,
tcl. Why do people use these interpreted languages as there are the
coffins of performance? IMHO, there also already is the trend to buy
medium abstraction (Java, Objective-C) by deliberately selling away
performance (Java programs are so lame, Objective-C can be speed up 10%
by optimizing the msg_send() assembly) and i think: Haskell has a much
better benefit-to-cost ratio then the others.

The second remark is:
* On machine level, Laziness is an overhead, but it's only a constant
factor. *Constant factor*!

There is a story about this in S. Skiena's "The Algorithm Design Manual"
where some guy eats all the CPU of a supercomputer for a stupid task
which he programmed using an asymptotically mediocre algorithm. Ouch.

This already applies to your polynomials: are you sure [(Int,Int)] is
the right data structure to fulfill your performance requirements on the
asymptotic running time of your intended operation? Or should you use
  Data.Map Int Int
which grants O(log n) random access (by index i) to every coefficient
a_i as opposed to the list case where you can only fetch a_i in O(n) time?
The answer is that your original representation is quite suitable as the
common operations +,*,diff,evaluation etc. for polynomials have no "a
priori" knowledge about which coefficients are 0 and which are not, they
have to discover it anyway. Only internally * needs a finite map to
collect common degrees. You can even write a function sum ::
[Polynomial] -> Polynomial which uses a finite map to collect degrees
and define + and * in terms of it.

By the way, I don't know any imperative hobby programmer for which
mutable arrays are not the one and for all collection data structure.
Haskell offers a much easier access to data structures like binary
search trees with appropriate polymorphism. IMHO, Haskell is even the
first language to have a binary search tree library which offers
satisfying generality and ease of use. This makes your program faster!

Only the third remark starts to be about performance in Haskell itself:
* Haskell is a lazy language. Why to make this your enemy, why to swim
against the currents by making your programs more strict?

As I remember, Cale once said on IRC how the relative strength of
strictness / laziness applies on functions which operate on the
different data sizes:
small -> small    doesn't really matter
                  (minor lazy overhead but we have
                   strictness analysis)
large -> small
  every part of the large thing is needed in calculation
                  strictness wins
large -> small
  only parts are needed
                  lazyness wins
large -> large    depends on large input like above
                  but roughly same otherwise
small -> large    roughly same

Only in the case where strictness wins, you have to insert a seq or two.
 As a rule of thumb, all recursive data structures like lists and trees
are large. Int's are small. The point is that when you are given an Int,
the first look at it will reveal all information about it. In case of a
list, a first look only reveals the first element.

What was the primary reason for laziness, anyway? The famous paper "Why
functional programming matters" by John Hughes gives the answer: you can
code programs in a more modular way (and this without biting performance
penalties). Laziness is what makes compositions like large -> large ->
small (large thing only partly needed) work.

> But this doesn't use tail recursion/accumulation (which seems to me
> like a complete hack that is a mandatory price to pay for using FP)

I never payed it :) You actually fell into the trap by rewriting your
code: performance got worse! Because of laziness, your first version can
deliver the first coefficient of the sum without calculation the others.
Your tail recursive version must calculate the hole sum before returning
anything. This is because the list constructor (:) is lazy in its second
argument. addPoly1 falls into the (large -> large) category.

Of course, if you have to sum Int's over a list (which is large -> small
and all list elements are needed), then it is wise to be strict and a
seq will come in handy (foldl' (+) 0 in Haskell). On the other hand, if
you are 'or'ing Boolean values over a list ((any) in Haskell), then
laziness is the right thing, why?

Your second version only adds overhead by introducing additional
parameters and so on, it won't get better than the minor laziness
overhead. Test both functions with "hugs +s" and be disappointed about
the number of reductions/cells used.

Concerning "overhead" in general, the simplest version will always do
and the compiler will figure out the rest with automatic strictness
analysis, inlining, unboxing, tail recursion etc. To the programmer,
tail recursion is of no importance in a lazy language.

And my last remark, again about data structures:
* (mutable) arrays and hash tables don't fit into a functional language.
Don't use them. Use Data.Map for random access.

That being said, static arrays whose entries do not change (you only
have a "single shot") can be very handy for Dynamic Programming. In this
case, it is crucial that they are "boxed", i.e. the entries are
calculated lazily.

If need any other data structure,
  http://haskell.org/ghc/docs/latest/html/libraries/
  http://haskell.org/haskellwiki/Libraries_and_tools/Data_structures
  http://www.eecs.tufts.edu/~rdocki01/edison.html

For (in)finite maps, tries are *the* natural data structure for
functional languages, see also
  Ralf Hinze. Generalizing generalized tries. Journal of Functional
  Programming, 10(4):327-351, July 2000
  http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz
If you are in a true need for speed, they'll outperform everything else.

Should you need a custom data structure ("appendable priority search
queue interval region ... tree sequence"), you can build one from the
all-rounding Finger Tree, see also
  Finger Trees: A Simple General-purpose Data Structure
  Ralf Hinze and Ross Paterson.
  in Journal of Functional Programming16:2 (2006), pages 197-217
  http://www.soi.city.ac.uk/~ross/papers/FingerTree.pdf

Regards,
apfelmus