[Haskell-cafe] Re: speed: ghc vs gcc

Claus Reinke claus.reinke at talk21.com
Fri Feb 20 15:15:59 EST 2009

Concrete examples always help, thanks.

Turning this into a ticket with associated test will:

- enable others to find and repeat the test when this thread is long gone,
    to see whether any other ghc changes have helped in any way
- enable documentation of what exactly the issue is (why is it slow?)
- enable others to vote for having this issue addressed
- help to keep the various performance issues separate (I seem to
    recall that you and others had found some other infelicities in 
    ghc-generated code, and lots of other useful bits a pieces over
    the years, not all of which have been resolved or made into tickets?)

Without ticket, such examples will be snowed under here in no
time. With ticket, it will take a little longer!-)

> afaik, ghc can be compared with 20-years old C compilers. it uses
> registers for performing tight loops but has very simple register
> allocation procedure. also it doesn't unroll loops

I've occasionally run into situations where it would have been nice
to have loop unrolling, or more generally, partial unfolding of recursive 
functions. But I've got the feeling that this isn't the whole story here,
or is it?

In simple enough situations, one can roll one's own loop unrolling;),
somewhat like shown below (worker/wrapper split to bring the function
parameter representing the loop body into scope, then template haskell 
to unroll its applications syntactically, then optimization by transformation
to get rid of the extra code). It is all rather more complicated than one
would like it to be, what with TH scoping restrictions and all, but perhaps 
a library of self-unrolling loop combinators along these lines might help, as 
a workaround until ghc does its own unrolling.


{-# LANGUAGE TemplateHaskell #-}
module Apply where
import Language.Haskell.TH.Syntax
apply i bound | i<bound   = [| \f x -> $(apply (i+1) bound) f (f i x) |]
              | otherwise = [| \f x -> x |]

{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -DN=8 -ddump-splices #-}
module Main(main) where
import Apply
main = print $ loopW 1 (10^9) body 0

{-# INLINE loopW #-}
loopW :: Int -> Int -> (Int -> Int -> Int) -> Int -> Int
loopW i max body acc = loop i acc
  loop :: Int -> Int -> Int
  loop !i !acc | i+N<=max  = loop (i+N) ($(apply (0::Int) N) (\j acc->body (i+j) acc) acc)
  loop !i !acc | i+8<=max  = loop (i+8) ( body (i+7)
                                        $ body (i+6)
                                        $ body (i+5)
                                        $ body (i+4)
                                        $ body (i+3)
                                        $ body (i+2)
                                        $ body (i+1)
                                        $ body i acc)
  loop !i !acc | i<=max    = loop (i+1) (body i acc)
               | otherwise = acc

body :: Int -> Int -> Int
body !i !acc = i+acc

More information about the Haskell-Cafe mailing list