[Haskell-cafe] not enough fusion?

Mon Jun 25 02:04:47 CEST 2012

Dear all, 

while doing some benchmarking (*)
I noticed that function  s1  is considerably faster than  s2
(but I wanted  s2  because it looks more natural)
(for n = 10000,  s1 takes 20 s, s2 takes 13 s; compiled by ghc-7.4.2 -O2)

s1 :: Int -> Int    
s1 n = sum $ do
        x <- [ 0 .. n-1 ]
        return $ sum $ do
            y <- [ 0 .. n-1 ]
            return $ gcd x y

s2 :: Int -> Int       
s2 n = sum $ do 
      x <- [ 0 .. n-1 ]
      y <- [ 0 .. n-1 ]
      return $ gcd x y

I was expecting that in both programs, 
all lists will be fused away (are they?)
so the code generator essentially can produce straightforward
assembly code (no allocations, no closures, etc.)

For reference, I also wrote the equivalent imperative program
(two nested loops, one accumulator for the sum)
(with the straightforward recursive gcd) 
and runtimes are (for same input as above)

C/gcc: 7.3 s , Java: 7.7 s, C#/Mono: 8.7 s

So, they sort of agree with each other, but disagree with ghc.
Where does the factor 2 come from? Lists? Laziness? 
Does  ghc  turn the tail recursion (in gcd) into a loop? (gcc does).
(I am looking at  -ddump-asm  but can't quite see through it.)

(*) benchmarking to show that today's compilers are clever enough 
such that the choice of paradigm/language does not really matter
for this kind of low-level programming.