[Haskell-cafe] optimization of recursive functions

Thu Feb 14 04:53:19 CET 2013

On 2/13/13 3:55 AM, Andrew Polonsky wrote:
> Hello,
>
> Is there any difference in efficiency between these two functions, when
> compiled with all optimizations?
>
> map f [] = []
> map f (a:as) = f a : map f as
>
> and
>
> map f x = map' x where
>     map' [] = []
>     map' (a:as) = f a : map' as

As Alp says, there is some difference since the second definition closes 
over f. Whether that closure is an optimization or a pessimization is a 
bit fragile. When there are multiple things being closed over, then the 
closure is faster than passing things to the recursion. But, at least in 
the past, there was a transition point where passing arguments is faster 
when there's only one of them; I haven't run benchmarks on newer 
versions of GHC to see if that's still true.

The second point is whether these definitions get inlined at their use 
sites or not. If they get inlined, then the second has a definite 
advantage since we will know f statically and can use a direct call 
rather than an indirect call (unless the static f is a locally bound 
variable and therefore still unknown). Being able to make indirect calls 
direct is one of the major benefits of inlining. However, in new 
versions of GHC, it bases decisions about whether to inline or not on 
how many arguments the definition has. Thus, you'd get more inlining if 
you used the definition:

     map f = map'
         where
         map' [] = []
         map' (a:as) = f a : map' as

since this allows inlining when map is only applied to f, rather than 
applied to both f and x. Sometimes this extra inlining is good (for the 
reasons mentioned above), but sometimes it's bad because it can lead to 
code bloat. Remember, the major bottleneck of computing these days is 
access to disk and memory. So increasing the size of the compiled file 
can lead to slowdown because you end up blowing the instruction cache 
and needing to load from memory more often. (The exact details of when 
this result occurs are, of course, extremely dependent on the particular 
program and what's causing the code bloat.)

In short, the worker/wrapper transform often improves performance, but 
for this specific example it's hard to say which is faster. Just 
benchmark it! Criterion is an excellent tool for answering questions 
like these, though it doesn't quite get at the inlining details. I 
wouldn't worry too much about it unless (a) profiling indicated that 
this function was a problem, or (b) I had a large client program that I 
could test performance with. Of course, if you're concerned about 
performance, another thing to look at here is list-fusion which will 
allow the compiler to remove intermediate lists.

-- 
Live well,
~wren