[Haskell-cafe] Re: speed: ghc vs gcc

Don Stewart dons at galois.com
Fri Feb 20 16:19:49 EST 2009


claus.reinke:
> Concrete examples always help, thanks.
>
> In simple enough situations, one can roll one's own loop unrolling;),
> somewhat like shown below (worker/wrapper split to bring the function
> parameter representing the loop body into scope, then template haskell  
> to unroll its applications syntactically, then optimization by 
> transformation
> to get rid of the extra code). It is all rather more complicated than one
> would like it to be, what with TH scoping restrictions and all, but 
> perhaps a library of self-unrolling loop combinators along these lines 
> might help, as a workaround until ghc does its own unrolling.
>
> Claus
>
> {-# LANGUAGE TemplateHaskell #-}
> module Apply where
> import Language.Haskell.TH.Syntax
> apply i bound | i<bound   = [| \f x -> $(apply (i+1) bound) f (f i x) |]
>              | otherwise = [| \f x -> x |]
>
> {-# LANGUAGE CPP #-}
> {-# LANGUAGE TemplateHaskell #-}
> {-# LANGUAGE BangPatterns #-}
> {-# OPTIONS_GHC -DN=8 -ddump-splices #-}
> module Main(main) where
> import Apply
> main = print $ loopW 1 (10^9) body 0
>
> {-# INLINE loopW #-}
> loopW :: Int -> Int -> (Int -> Int -> Int) -> Int -> Int
> loopW i max body acc = loop i acc
>  where
>  loop :: Int -> Int -> Int
>  loop !i !acc | i+N<=max  = loop (i+N) ($(apply (0::Int) N) (\j acc->body (i+j) acc) acc)
>  {-
>  loop !i !acc | i+8<=max  = loop (i+8) ( body (i+7)
>                                        $ body (i+6)
>                                        $ body (i+5)
>                                        $ body (i+4)
>                                        $ body (i+3)
>                                        $ body (i+2)
>                                        $ body (i+1)
>                                        $ body i acc)
>  -}
>  loop !i !acc | i<=max    = loop (i+1) (body i acc)
>               | otherwise = acc
>
> body :: Int -> Int -> Int
> body !i !acc = i+acc
>

Great thinking! This is EXTREMELY COOL!

        Main.hs:15:42-57: Splicing expression
            let
              apply = apply
              $dOrd = GHC.Base.$f1
              $dNum = GHC.Num.$f6
              $dLift = Language.Haskell.TH.Syntax.$f18
            in apply (0 :: Int) 8
          ======>
            \ f[a1KU] x[a1KV]
                -> \ f[a1KW] x[a1KX]
                       -> \ f[a1KY] x[a1KZ]
                              -> \ f[a1L0] x[a1L1]
                                     -> \ f[a1L2] x[a1L3]
                                            -> \ f[a1L4] x[a1L5]
                                                   -> \ f[a1L6] x[a1L7]
                                                          -> \ f[a1L8] x[a1L9]
                                                                 -> \ f[a1La] x[a1Lb] -> x[a1Lb]
                                                                      f[a1L8] (f[a1L8] 7 x[a1L9])
                                                               f[a1L6] (f[a1L6] 6 x[a1L7])
                                                        f[a1L4] (f[a1L4] 5 x[a1L5])
                                                 f[a1L2] (f[a1L2] 4 x[a1L3])
                                          f[a1L0] (f[a1L0] 3 x[a1L1])
                                   f[a1KY] (f[a1KY] 2 x[a1KZ])
                            f[a1KW] (f[a1KW] 1 x[a1KX])
                     f[a1KU] (f[a1KU] 0 x[a1KV])
        In the second argument of `loop', namely
            `($(apply (0 :: Int) 8) (\ j acc -> body (i + j) acc) acc)'
        In the expression:
            loop
              (i + 8) ($(apply (0 :: Int) 8) (\ j acc -> body (i + j) acc) acc)
        In the definition of `loop':
            loop !i !acc
                   | i + 8 <= max
                   = loop
                       (i + 8) ($(apply (0 :: Int) 8) (\ j acc -> body (i + j) acc) acc)

So, that's the fastest yet:

    $ time ./Main
    500000000500000000
    ./Main  0.61s user 0.00s system 98% cpu 0.623 total

And within 2x the best GCC was doing,

     gcc -O3 -funroll-loops      0.318

If we unroll even further...

    $ ghc -O2 -fvia-C -optc-O3 -D64 Main.hs

    $ time ./Main
    500000000500000000
    ./Main  0.08s user 0.00s system 94% cpu 0.088 total

Very very nice, Claus!

Now I'm wondering if we can do this via rewrite rules....

-- Don


More information about the Haskell-Cafe mailing list