[Haskell-cafe] GHC predictability
Andrew Coppin
andrewcoppin at btinternet.com
Mon May 12 15:01:53 EDT 2008
Don Stewart wrote:
> jeff.polakow:
>
>> Hello,
>>
>> One frequent criticism of Haskell (and by extension GHC) is that it has
>> unpredictable performance and memory consumption. I personally do not find
>> this to be the case. I suspect that most programmer confusion is rooted in
>> shaky knowledge of lazy evaluation; and I have been able to fix, with
>> relative ease, the various performance problems I've run into. However I
>> am not doing any sort of performance critical computing (I care about
>> minutes or seconds, but not about milliseconds).
>>
>> I would like to know what others think about this. Is GHC predictable? Is
>> a thorough knowledge of lazy evaluation good enough to write efficient
>> (whatever that means to you) code? Or is intimate knowledge of GHC's
>> innards necessary?
>>
>> thanks,
>> Jeff
>>
>> PS I am conflating Haskell and GHC because I use GHC (with its extensions)
>> and it produces (to my knowledge) the fastest code.
>>
>
> This has been my experience to. I'm not even sure where
> "unpredicatiblity" would even come in, other than though not
> understanding the demand patterns of the code.
>
> It's relatively easy to look at the Core to get a precise understanding
> of the runtime behaviour.
>
> I've also not found the GC unpredicatble either.
>
I offer up the following example:
mean xs = sum xs / length xs
Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
If we now rearrange this to
mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1
in s' `seq` n' `seq` (s', n')) (0,0)
and run the same example, and watch it run in constant space.
Of course, the first version is clearly readable, while the second one
is almost utterly incomprehensible, especially to a beginner. (It's even
more fun that you need all those seq calls in there to make it work
properly.)
The sad fact is that if you just write something in Haskell in a nice,
declarative style, then roughly 20% of the time you get good
performance, and 80% of the time you get laughably poor performance. For
example, I sat down and spent the best part of a day writing an MD5
implementation. Eventually I got it so that all the test vectors work
right. (Stupid little-endian nonsense... mutter mutter...) When I tried
it on a file containing more than 1 MB of data... ooooohhhh dear... I
gave up after waiting several minutes for an operation that the C
implementation can do in milliseconds. I'm sure there's some way of
fixing this, but... the source code is pretty damn large, and very messy
as it is. I shudder to think what you'd need to do to it to speed it up.
Of course, the first step in any serious attempt at performance
improvement is to actually profile the code to figure out where the time
is being spent. Laziness is *not* your friend here. I've more or less
given up trying to comprehend the numbers I get back from the GHC
profiles, because they apparently defy logic. I'm sure there's a reason
to the madness somewhere, but... for nontrivial programs, it's just too
hard to figure out what's going on.
Probably the best part is that almost any nontrivial program you write
spends 60% or more of its time doing GC rather than actual work. Good
luck with the heap profiler. It's even more mysterious than the time
profiles. ;-)
In short, as a fairly new Haskell programmer, I find it completely
impossibly to write code that doesn't crawl along at a snail's pace.
Even when I manage to make it faster, I usually have no clue why. (E.g.,
adding a seq to a mergesort made it 10x faster. Why? Changing from
strict ByteString to lazy ByteString made one program 100x faster. Why?)
Note that I'm not *blaming* GHC for this - I think it's just inherantly
very hard to predict performance in a lazy language. (Remember,
deterministic isn't the same as predictable - see Chaos Theory for why.)
I wish it wasn't - becuase I really *really* want to write all my
complex compute-bounded programs in Haskell, because it makes algorithms
so beautifully easy to express. But when you're trying to implement
something that takes hours to run even in C...
More information about the Haskell-Cafe
mailing list