# [Haskell-cafe] Re: ANN: HLint 1.2

Max Bolingbroke batterseapower at hotmail.com
Tue Jan 13 09:21:58 EST 2009

```2009/1/13 Simon Marlow <marlowsd at gmail.com>:
>> GHC should indeed be doing so. I'm working (on and off) to work out
>> some suitable heuristics and put the transformation into ghc -O2.
>> There are a few wrinkles that still need sorting out, but preliminary
>> indications are that it decreases the runtime of our standard
>> benchmark suite, nofib, by 12% or so.
>
> !!!
>
> That's a surprising result - have you looked closely at the places where the
> transformation is having a big effect to see what's going on?

Yes, it is rather better than I expected :-)

The main gains seem to come from specialising higher-order functions
on particular arguments, like we saw earlier in this thread. There
seem to be a number of suitable functions in the standard library that
aren't written in static-argument-transformed (SATed) style.

Another gain comes from the nofib program "atom", which has a function
a lot like this:

f x y z = (x, y, z) : f x y z

Once this is SATed it becomes a much better function:

f = let f' = (x, y, z) : f'
in f'

Which decreases runtime of atom by 97% :-)

The catch is that things written in this style can actually be worse
than their lambda-lifted brethren. This happens for 3 main reasons:

1) SATed functions tend to have case liberation applied to them
instead of constructor specialisation. Case liberation kind of sucks
in comparison to constructor specialisation, so bad things happen
(increased allocations and code size)
2) Carrying around a single variable in the SAT closure just adds
indirection to the generated code with no benefits, so it's better to
remove that indirection (by lambda lifting) just before going to STG -
but be careful not to change the unfolding!
3) More SATing means more expressions are loop invariant. This is
usually a good thing, but if you float a loop-invariant out of a "cold
branch" of a recursive function (a branch that is actually only
entered once dynamically) then you end up eagerly allocating a closure
for the loop-invariant thing which may never be entered. This is sort
of a bug in our current float-out pass.

Like I said, I'm working on improving the situation with 1 and 2,
which need to be resolved to iron out some of the bad cases in nofib.
I need to find time to take a look at this though.

Cheers,
Max
```