[Haskell-cafe] Reasoning about performance
carter.schonwald at gmail.com
Wed Sep 4 02:02:54 CEST 2013
It's also worth adding that ghci does a lot less optimization than ghc.
Likewise, the best tool for doing performance benchmarking is the excellent
To repeat: use compiled code for benchmarking, and use criterion. Ghci is
not as performance tuned as compiled code, except when ghci is linking in
code that's already been compiled
On Tuesday, September 3, 2013, Bob Ippolito wrote:
> Haskell's non-strict evaluation can often lead to unexpected results when
> doing tail recursion if you're used to strict functional programming
> languages. In order to get the desired behavior you will need to force the
> accumulator (with something like Data.List's foldl', $!,
> seq, BangPatterns, etc.).
> One issue with the tail recursive version is that you're basically
> sequencing it twice, forcing the result is going to force the whole spine
> of the list, and then you're walking it again with deepseq. The overhead of
> the deepseq call with that size list on my machine is 2.2 seconds, so
> you're basically paying that cost twice with a tail recursive version.
> allPairs2 only forces what is needed, so deepseq isn't traversing any data
> structures that are already forced.
> I think what's happening with allPairs2 is that GHC is throwing away the
> the list during the traversal since the result of the computation isn't
> used at all. I get very different timings (still fast, but no longer orders
> of magnitude difference) for allPairs2 if the result is still around. You
> could probably figure this out with the heap profiler.
> h> deepseq (allPairs2 [1..10000]) True
> (2.47 secs, 4403724176 bytes)
> h> let x = allPairs2 [1..10000]
> (0.00 secs, 510488 bytes)
> h> deepseq x True
> (10.47 secs, 4401625192 bytes)
> h> deepseq x True
> (2.21 secs, 1031864 bytes)
> I'm no expert, but I think that allPairs2 is likely as good as you're
> going to get with straightforward Haskell using lists. It doesn't build up
> a lot of unevaluated computation, and if you only need to see the first n
> elements it's not going to have to process the entire input. allPairs1 has
> most of the good properties of allPairs2 but is a bit worse because of the
> concatenation, though concatenating in this way isn't a disaster like it
> would be in a strict functional programming language.
> > wrote:
>> I'm a Haskell beginner, and I'm baffled trying to reason about code
>> performance, at least with GHC. For a program I'm writing I needed to
>> find all pairs of elements of a list. That is, given the list "ABCD"
>> I wanted to wind up with the list
>> [('A','B'),('A','C'),('A','D')**,('B','C'),('B','D'),('C','D')**], where
>> the order of elements in the resulting list is immaterial, but I don't
>> want both ('A', 'B') and ('B', 'A'), for example.
>> My first attempt looked like this:
>> allPairs1 :: [a] -> [(a, a)]
>> allPairs1  = 
>> allPairs1 (x:xs) = map (\a -> (x, a)) xs ++ allPairs1 xs
>> Benchmarking with ghci's ":set +s" reveals the following performance
>> (median of three runs shown):
>> $ ghci
>> GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
>> Loading package ghc-prim ... linking ... done.
>> Loading package integer-gmp ... linking ... done.
>> Loading package base ... linking ... done.
>> Prelude> :!ghc -c -O2 allpairs.hs
>> Prelude> :load allpairs
>> Ok, modules loaded: AllPairs.
>> Prelude AllPairs> :m +Control.DeepSeq
>> Prelude Control.DeepSeq AllPairs> :show modules
>> AllPairs ( allpairs.hs, allpairs.o )
>> Prelude Control.DeepSeq AllPairs> :set +s
>> Prelude Control.DeepSeq AllPairs> deepseq (allPairs1 [1..10000]) True
>> (4.85 secs, 4004173984 bytes)
>> A colleague suggested getting rid of the list append as follows:
>> allPairs2 :: [a] -> [(a, a)]
>> allPairs2  = 
>> allPairs2 (x:xs) = allPairs2' x xs xs
>> where allPairs2' x   = 
>> allPairs2' x (y:ys) zs = (x,y) : allPairs2' x ys zs
>> allPairs2' x  (z:zs) = allPairs2' z zs zs
>> Not surprisingly, that runs faster:
>> Prelude Control.DeepSeq AllPairs> deepseq (allPairs2 [1..10000]) True
>> (2.14 secs, 4403686376 bytes)
>> I then figured I could do even better by rewriting the code
>> allPairs3 :: [a] -> [(a, a)]
>> allPairs3  = 
>> allPairs3 (x:xs) = allPairs3' x xs xs 
>> where allPairs3' :: a -> [a] -> [a] -> [(a, a)] -> [(a, a)]
>> allPairs3' h (x:xs) ys result = allPairs3' h xs ys ((h,
>> allPairs3' _  (y:ys) result = allPairs3' y ys ys result
>> allPairs3' _   result = result
>> This takes half the memory as the above (yay!) but runs substantially
>> *slower* (and appears to be thrashing my computer's memory system):
>> Prelude Control.DeepSeq AllPairs> deepseq (allPairs3 [1..10000]) True
>> (10.58 secs, 2403820152 bytes)
>> What gives? Why does my tail-recursive implementation run even slower
>> and presumably produce more garbage than my initial, naive
>> implementation? Is there some optimization GHC is performing for
>> allPairs1 and allPairs2 that it can't apply to allPairs3?
>> Similarly, how should I, as a newcomer who wants to write efficient
>> Haskell code, reason about whether a code change is likely to improve
>> rather than degrade performance? Guidelines like "per-iteration
>> appends = bad" and "tail recursion = good" are great, but the
>> preceding data indicate that something subtler is taking precedence
>> over those guidelines with respect to code performance.
>> Thanks in advance,
>> -- Scott
>> Haskell-Cafe mailing list
>> 'Haskell-Cafe at haskell.org');>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe