[Haskell-cafe] Reasoning about performance

Wed Sep 4 01:43:06 CEST 2013

Haskell's non-strict evaluation can often lead to unexpected results when
doing tail recursion if you're used to strict functional programming
languages. In order to get the desired behavior you will need to force the
accumulator (with something like Data.List's foldl', $!,
seq, BangPatterns, etc.).

One issue with the tail recursive version is that you're basically
sequencing it twice, forcing the result is going to force the whole spine
of the list, and then you're walking it again with deepseq. The overhead of
the deepseq call with that size list on my machine is 2.2 seconds, so
you're basically paying that cost twice with a tail recursive version.
allPairs2 only forces what is needed, so deepseq isn't traversing any data
structures that are already forced.

I think what's happening with allPairs2 is that GHC is throwing away the
the list during the traversal since the result of the computation isn't
used at all. I get very different timings (still fast, but no longer orders
of magnitude difference) for allPairs2 if the result is still around. You
could probably figure this out with the heap profiler.

h> deepseq (allPairs2 [1..10000]) True
True
(2.47 secs, 4403724176 bytes)
h> let x = allPairs2 [1..10000]
(0.00 secs, 510488 bytes)
h> deepseq x True
True
(10.47 secs, 4401625192 bytes)
h> deepseq x True
True
(2.21 secs, 1031864 bytes)

I'm no expert, but I think that allPairs2 is likely as good as you're going
to get with straightforward Haskell using lists. It doesn't build up a lot
of unevaluated computation, and if you only need to see the first n
elements it's not going to have to process the entire input. allPairs1 has
most of the good properties of allPairs2 but is a bit worse because of the
concatenation, though concatenating in this way isn't a disaster like it
would be in a strict functional programming language.

-bob

On Tue, Sep 3, 2013 at 3:28 PM, Scott Pakin <pakin at lanl.gov> wrote:

> I'm a Haskell beginner, and I'm baffled trying to reason about code
> performance, at least with GHC.  For a program I'm writing I needed to
> find all pairs of elements of a list.  That is, given the list "ABCD"
> I wanted to wind up with the list
> [('A','B'),('A','C'),('A','D')**,('B','C'),('B','D'),('C','D')**], where
> the order of elements in the resulting list is immaterial, but I don't
> want both ('A', 'B') and ('B', 'A'), for example.
>
> My first attempt looked like this:
>
>     allPairs1 :: [a] -> [(a, a)]
>     allPairs1 []     = []
>     allPairs1 (x:xs) = map (\a  -> (x, a)) xs ++ allPairs1 xs
>
> Benchmarking with ghci's ":set +s" reveals the following performance
> (median of three runs shown):
>
>     $ ghci
>     GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
>     Loading package ghc-prim ... linking ... done.
>     Loading package integer-gmp ... linking ... done.
>     Loading package base ... linking ... done.
>     Prelude> :!ghc -c -O2 allpairs.hs
>     Prelude> :load allpairs
>     Ok, modules loaded: AllPairs.
>     Prelude AllPairs> :m +Control.DeepSeq
>     Prelude Control.DeepSeq AllPairs> :show modules
>     AllPairs         ( allpairs.hs, allpairs.o )
>     Prelude Control.DeepSeq AllPairs> :set +s
>     Prelude Control.DeepSeq AllPairs> deepseq (allPairs1 [1..10000]) True
>     True
>     (4.85 secs, 4004173984 bytes)
>
> A colleague suggested getting rid of the list append as follows:
>
>     allPairs2 :: [a] -> [(a, a)]
>     allPairs2 [] = []
>     allPairs2 (x:xs) = allPairs2' x xs xs
>       where allPairs2' x []     []     = []
>             allPairs2' x (y:ys) zs     = (x,y) : allPairs2' x ys zs
>             allPairs2' x []     (z:zs) = allPairs2' z zs zs
>
> Not surprisingly, that runs faster:
>
>     Prelude Control.DeepSeq AllPairs> deepseq (allPairs2 [1..10000]) True
>     True
>     (2.14 secs, 4403686376 bytes)
>
> I then figured I could do even better by rewriting the code
> tail-recursively:
>
>     allPairs3 :: [a] -> [(a, a)]
>     allPairs3 [] = []
>     allPairs3 (x:xs) = allPairs3' x xs xs []
>       where allPairs3' :: a -> [a] -> [a] -> [(a, a)] -> [(a, a)]
>             allPairs3' h (x:xs) ys     result = allPairs3' h xs ys ((h,
> x):result)
>             allPairs3' _ []     (y:ys) result = allPairs3' y ys ys result
>             allPairs3' _ []     []     result = result
>
> This takes half the memory as the above (yay!) but runs substantially
> *slower* (and appears to be thrashing my computer's memory system):
>
>     Prelude Control.DeepSeq AllPairs> deepseq (allPairs3 [1..10000]) True
>     True
>     (10.58 secs, 2403820152 bytes)
>
> What gives?  Why does my tail-recursive implementation run even slower
> and presumably produce more garbage than my initial, naive
> implementation?  Is there some optimization GHC is performing for
> allPairs1 and allPairs2 that it can't apply to allPairs3?
>
> Similarly, how should I, as a newcomer who wants to write efficient
> Haskell code, reason about whether a code change is likely to improve
> rather than degrade performance?  Guidelines like "per-iteration
> appends = bad" and "tail recursion = good" are great, but the
> preceding data indicate that something subtler is taking precedence
> over those guidelines with respect to code performance.
>
> Thanks in advance,
> -- Scott
>
> ______________________________**_________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130903/38f26dbb/attachment.htm>