[Haskell-cafe] not enough fusion?

Dmitry Olshansky olshanskydr at gmail.com
Mon Jun 25 12:55:03 CEST 2012


In my test it works ~20% faster than s2 and ~20% slower than s1.
Did you use -O2 flag?

2012/6/25 Lorenzo Bolla <lbolla at gmail.com>

> I wonder why this performs really badly, though (I would expect it to be
> the same as s2):
>
> s3 :: Int -> Int
> s3 n = sum [gcd x y | x <- [ 0 .. n-1 ], y <- [ 0 .. n-1 ]]
>
> From the links posted by Dmitry, it might be that the code generated is
> made of 2 recursive calls: in fact, what I observe is a "stack space
> overflow" error on runtime...
>
> L.
>
>
>
>
> On Mon, Jun 25, 2012 at 10:09 AM, Dmitry Olshansky <olshanskydr at gmail.com>wrote:
>
>> s1 ~ sum $ map (sum . flip map [0..n] . gcd) [0..n]
>> s2 ~ sum $ concatMap (flip map [0..n] . gcd) [0..n]
>>
>> There are some posts from Joachim Breitner investigated fusion for
>> concatMap:
>>
>> http://www.haskell.org/pipermail/haskell-cafe/2011-December/thread.html#97227
>>
>>
>>
>> 2012/6/25 Johannes Waldmann <waldmann at imn.htwk-leipzig.de>
>>
>>> Dear all,
>>>
>>> while doing some benchmarking (*)
>>> I noticed that function  s1  is considerably faster than  s2
>>> (but I wanted  s2  because it looks more natural)
>>> (for n = 10000,  s1 takes 20 s, s2 takes 13 s; compiled by ghc-7.4.2 -O2)
>>>
>>> s1 :: Int -> Int
>>> s1 n = sum $ do
>>>        x <- [ 0 .. n-1 ]
>>>        return $ sum $ do
>>>            y <- [ 0 .. n-1 ]
>>>            return $ gcd x y
>>>
>>> s2 :: Int -> Int
>>> s2 n = sum $ do
>>>      x <- [ 0 .. n-1 ]
>>>      y <- [ 0 .. n-1 ]
>>>      return $ gcd x y
>>>
>>> I was expecting that in both programs,
>>> all lists will be fused away (are they?)
>>> so the code generator essentially can produce straightforward
>>> assembly code (no allocations, no closures, etc.)
>>>
>>>
>>> For reference, I also wrote the equivalent imperative program
>>> (two nested loops, one accumulator for the sum)
>>> (with the straightforward recursive gcd)
>>> and runtimes are (for same input as above)
>>>
>>> C/gcc: 7.3 s , Java: 7.7 s, C#/Mono: 8.7 s
>>>
>>>
>>> So, they sort of agree with each other, but disagree with ghc.
>>> Where does the factor 2 come from? Lists? Laziness?
>>> Does  ghc  turn the tail recursion (in gcd) into a loop? (gcc does).
>>> (I am looking at  -ddump-asm  but can't quite see through it.)
>>>
>>>
>>> (*) benchmarking to show that today's compilers are clever enough
>>> such that the choice of paradigm/language does not really matter
>>> for this kind of low-level programming.
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120625/ff6e0c08/attachment.htm>


More information about the Haskell-Cafe mailing list