[Haskell-cafe] Epic failure

Don Stewart dons at galois.com
Sun Feb 15 17:09:38 EST 2009


> OK, what did I do wrong here?

When making a request for help on a compiler issue, you failed to include key
information to make it possible to reproduce your problem, and what you did
include was broken or incorrect.

The three programs that submitted don't do even do the same thing.

Let's look into this further.

* Program 1:

>       module Main () where
>
>       import Data.List
>
>        top = 10 ^ 8 :: Int
>
>        main = do
>         let numbers = [1 .. top]
>         print $ foldl' (+) 0 numbers
>
>        -- Runtime is 20 seconds.

Well, let's see if we can reproduce this:

    $ ghc -O2 A.hs --make
    [1 of 1] Compiling Main             ( A.hs, A.o )
    Linking A ...

    $ time ./A
    5000000050000000
    ./A  1.54s user 0.01s system 98% cpu 1.571 total

Nope. OK, so this seems like user error. Without more info about how you
conducted your experiment, the results are meaningless.

My guess is that you compiled it without optimisations?
Nope, not that,

    $ ghc -O0 --make A.hs
    $ time ./A                            
    5000000050000000
    ./A  2.65s user 0.01s system 99% cpu 2.667 tota

So even with all optimisations disabled, it is still an order of magnitude
faster than the number you presented. Resolution: invalid. Not reproducible.


* Program 2

>       #include <stdio.h>
>
>       int main(void)
>        {
>         int total = 0;
>         int n;
>
>         for (n = 1, n < 100000000; n++)
>         total += n;
>
>         printf("%d", n);
>        }
>
>      // Runtime is 0.8 seconds.

Ok. Let's try this then, a C program:

    $ gcc t.c 
    t.c: In function ‘main’:
    t.c:8: error: expected ‘;’ before ‘)’ token

Ah, an incorrect C program. Correcting the OP's typo:

    $ time ./a.out
    100000000
    ./a.out  0.41s user 0.00s system 100% cpu 0.416 total

So its actually a different program. Is this supposed to print 'total'?
This program seems wrong in a number of other ways too.

Resolution: non-sequitor


>     Program 3
>
>        module Main () where
>
>        import Data.List
>
>        top = 10 ^ 8 :: Int
>
>        kernel i o = if i < top then o `seq` kernel (i+1) (o+i) else o
>
>        main = do
>         print $ kernel 1 0

>   -- Runtime is 0.5 seconds.
>   Clearly these two nearly identical Haskell programs should have exactly
>   the same runtime. Instead, one is 40x slower. Clearly something has gone
>   catastrophically wrong here. The whole point of GHC's extensive optimiser
>   passes are to turn the first example into the second example - but
>   something somewhere hasn't worked. Any suggestions?


Ok, another program. Let's try this.

    $ ghc -O2 B.hs --make
    [1 of 1] Compiling Main             ( B.hs, B.o )
    Linking B ...

    $ time ./B
    4999999950000000
    ./B  0.18s user 0.00s system 98% cpu 0.186 total

Oh, this is produces yet another result. In 0.186 seconds.


So, going back to the original question, what did you do wrong?

If you're seeking input for a technical error relating to performance, you
should have, but failed to:

    * Use programs that implement the same algorithm
    * Indicate which compiler versions/optimisations/architecture you're on
    * State what you expected the results to be.

Besides the technical aspects, your presentation categorises the post somwhere between internet
crank and internet troll, as:

    * You used an inflammatory title, which doesn't inspire trust.

    * You jumped to conclusions on the fundamental nature of a technology,
      striking at its reasons for existing, without considering to recheck your
      assumptions.
 
So yes, epic fail. And after three years of this, I'm not hopeful.
But perhaps you can take some lessons from this post to improve your next one?


------------------------------------------------------------------------

Now, assuming good faith, and you were just very confused about what you were
measuring, or how to measure it, or how to ask for help in a technical forum,
here's some more fun: let's write your 1st program, and see if GHC can
transform it into your 3rd program.

    $ ghc --version
    The Glorious Glasgow Haskell Compilation System, version 6.10.1

    $ uname -msr
    Linux 2.6.28-ARCH x86_64

    $ gcc --version
    gcc (GCC) 4.3.3

    $ ghc-pkg list uvector
        uvector-0.1.0.3

Here's a a program written with combinators in a high level style (and using a
library written in a high level way):

    import Data.Array.Vector
    main = print . sumU . enumFromToU 0 $ (10^8 :: Int)

Compiling it,

    $ ghc -O2 --make C.hs

Which yields the following core,

    $wfold_s15D :: Int# -> Int# -> Int#
    $wfold_s15D =
      \ (ww1_s15a :: Int#) (ww2_s15e :: Int#) ->
        case ># ww2_s15e ww_s154 of wild_a12I {
          False ->
            $wfold_s15D
              (+# ww1_s15a ww2_s15e) (+# ww2_s15e 1);
          True -> ww1_s15a

Because GHC knows how to optimise loops of these forms.

And the resulting assembly is pretty nice,

    s16o_info:
      .Lc17g:
      cmpq 6(%rbx),%rdi
      jg .Lc17j
      leaq 1(%rdi),%rax
      addq %rdi,%rsi
      movq %rax,%rdi
      jmp s16o_info

Running it,

    $ time ./C
    5000000050000000
    ./C  0.17s user 0.00s system 99% cpu 0.175 total


We can also try the C backend, though we probably don't expect much change here.

    $ ghc -O2 -fvia-C -optc-O3 --make C.hs

Yielding an inner loop of,

    s16o_info:
      cmpq        6(%rbx), %rdi
      jg  .L2
      addq        %rdi, %rsi
      leaq        1(%rdi), %rdi
      jmp s16o_info

A little different in the inner loop if we run the code though GCC. Interesting!

    $ time ./C
    5000000050000000
    ./C  0.15s user 0.00s system 99% cpu 0.155 total

Some small difference.

-- Don


More information about the Haskell-Cafe mailing list