[Haskell-cafe] GHC predictability

Tue May 13 02:20:22 EDT 2008

gale:
> Andrew Coppin wrote:
> >  I offer up the following example:
> >
> >   mean xs = sum xs / length xs
> >
> >  Now try, say, "mean [1.. 1e9]", and watch GHC eat several GB of RAM. (!!)
> >
> >  If we now rearrange this to
> >
> >   mean = (\(s,n) -> s / n) . foldr (\x (s,n) -> let s' = s+x; n' = n+1 in s'
> > `seq` n' `seq` (s', n')) (0,0)
> >
> >  and run the same example, and watch it run in constant space.
> >
> >  Of course, the first version is clearly readable, while the second one is
> > almost utterly incomprehensible, especially to a beginner. (It's even more
> > fun that you need all those seq calls in there to make it work properly.)
> 
> You can write it like this:
> 
> mean = uncurry (/) . foldl' (\(s,n) x -> ((,) $! s+x) $! n+1) (0,0)
> 
> I don't think that's so bad. And for real-life examples, you almost
> never need the ($!)'s or seq's - your function will do some kind
> of pattern matching that will force the arguments. So really, all
> you need to remember is: if you're repeating a fast calculation across
> a big list, use foldl'. And insertWith', if you're storing the result in
> a Data.Map. That's about it.
> 
> >  The sad fact is that if you just write something in Haskell in a nice,
> > declarative style, then roughly 20% of the time you get good performance,
> > and 80% of the time you get laughably poor performance.
> 
> I don't know why you think that. I've written a wide variety of functions
> over the past few years. I find that when performance isn't good enough,
> it's because of the algorithm, not because of laziness. Laziness
> works for me, not against me.
> 
> Of course, it depends what you mean by "good performance". I have
> never needed shootout-like performance. But to get that, you need
> some messy optimization in any language.

We can actually get great performance here,

    {-# LANGUAGE TypeOperators #-}

    import Data.Array.Vector
    import Text.Printf

    mean :: UArr Double -> Double
    mean arr = b / fromIntegral a
      where
        k (n :*: s) a = n+1 :*: s+a
        a :*: b = foldlU k (0 :*: 0) arr :: (Int :*: Double)

    main = printf "%f\n" . mean $ enumFromToFracU 1 1e9

ghc -O2

    $ time ./A
    500000000.067109
    ./A  3.69s user 0.00s system 99% cpu 3.692 total

Versus on lists:

    import Data.List
    import Text.Printf
    import Data.Array.Vector

    mean :: [Double] -> Double
    mean arr = b / fromIntegral a
      where
        k (n :*: s) a = (n+1 :*: s+a)
        (a :*: b) = foldl' k (0 :*: 0) arr :: (Int :*: Double)

    main = printf "%f\n" . mean $ [1 .. 1e9]    

    $ time ./A     
    500000000.067109
    ./A  66.08s user 1.53s system 99% cpu 1:07.61 total

Note the use of strict pairs. Key to ensuring  the accumulators end up in
registers.    The performance difference here is due to fold (and all left
folds) not fusing in normal build/foldr fusion.

The vector version runs about the same speed as unoptimsed C.

-- Don