[Haskell-cafe] Re: Very fast loops. Now!

Donald Bruce Stewart dons at cse.unsw.edu.au
Mon Feb 12 16:18:25 EST 2007


ewilligers:
> Eric Willigers wrote:
> >Do the two programs implement the same algorithm? The C program updates 
> >x and y in sequence. The Haskell program updates x and y in parallel and 
> >can be easier for the compiler to optimize.
> 
> 
> Hi Don,
> 
> Expressing this in other words, do we want the new y to be based on the 
> new x or on the old x?
> 
> I extracted the code you posted to CS.c and HP.hs (C sequential, Hashell 
> parallel).
> 
> I made the following minor changes to form CP.c and HS.hs (C parallel, 
> Hashell sequential):-
> 
>     double xn;
>     for (; i<=1000000000; i++) {
>         xn = x*y/3.0;
>         y = x*9.0;
>         x = xn;
>     }
> 
>     | otherwise       = go xs (xs*9) (i+1)
>                         where xs = x*y/3
> 
> 
> Tested on a 2.8 GHz Pentium 4, running XP SP2 and cygwin, using the 
> compiler options from your post. Each program was run once.
> 
> $ uname -a
> CYGWIN_NT-5.1 nemo 1.5.21(0.156/4/2) 2006-07-30 14:21 i686 Cygwin
> 
> $ gcc --version
> gcc (GCC) 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)
> 
> $ ghc --version
> The Glorious Glasgow Haskell Compilation System, version 6.6
> 
> 
> $ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 CP.c -o CP
> 
> $ time ./CP
> 3.333333
> 
> real    0m10.560s
> user    0m10.546s
> sys     0m0.015s
> 
> 
> $ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 CS.c -o CS
> 
> $ time ./CS
> 3.333333
> 
> real    0m10.788s
> user    0m10.718s
> sys     0m0.015s
> 
> 
> $ ghc -O -fexcess-precision -fbang-patterns -optc-O3 -optc-ffast-math 
> -optc-mfpmath=sse -optc-msse2 HP.hs -o HP
> 
> $ time ./HP
> 3.3333333333333335
> 
> real    1m8.550s
> user    0m0.015s
> sys     0m0.031s
> 
> 
> $ ghc -O -fexcess-precision -fbang-patterns -optc-O3 -optc-ffast-math 
> -optc-mfpmath=sse -optc-msse2 HS.hs -o HS
> 
> $ time ./HS
> 3.3333333333333335
> 
> real    1m9.425s
> user    0m0.015s
> sys     0m0.046s


-fexcess-precision *must* be provided as a pragma, 

    {-# OPTIONS -fexcess-precision #-}
        
    import Text.Printf

    main = go (1/3) 3 1

    go :: Double -> Double -> Int -> IO ()
    go !x !y !i
        | i == 1000000000 = printf "%.6f\n" (x+y)
        | otherwise       = go xn (xn*9) (i+1)
                               where xn = x*y/3

C source:

    #include <stdio.h>

    int main()
    {
        double x = 1.0/3.0;
        double y = 3.0;
        int i    = 1;
        double xn;

        for (; i<1000000000; i++) {
             xn = x*y/3.0;
             y = x*9.0;
             x = xn;
        }

        printf("%f\n", x+y);
        return 0;
    }



------------------------------------------------------------------------

$ ghc -O2 -optc-O3 -optc-ffast-math -optc-mfpmath=sse -optc-msse2 -fbang-patterns HP.hs -o hp
$ time ./hp                                                                                  
3.333333
./hp  17.59s user 0.01s system 99% cpu 17.646 total

$ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 t.c -o cc
$ time ./cc
3.333333
./cc  8.70s user 0.00s system 98% cpu 8.837 total


------------------------------------------------------------------------

Now, if we rewrite it to not use the temporary:

    go :: Double -> Double -> Int -> IO ()
    go !x !y !i
        | i == 1000000000 = printf "%.6f\n" (x+y)
        | otherwise       = go (x*y/3) (x*9) (i+1)


    for (; i<1000000000; i++) {
        x = x*y/3.0;
        y = x*9.0;
    }

------------------------------------------------------------------------

$ time ./hp
3.333333
./hp  9.95s user 0.00s system 99% cpu 9.965 total

$ time ./cc                                                 
3.333333
./cc  10.06s user 0.00s system 99% cpu 10.110 total

------------------------------------------------------------------------

    $ gcc --version
    gcc (GCC) 3.3.5 (propolice)

    $ ghc --version
    The Glorious Glasgow Haskell Compilation System, version 6.6

    $ dmesg| grep cpu0 | head
    cpu0: Intel(R) Pentium(R) M processor 1600MHz ("GenuineIntel" 686-class) 1.60 GHz

-- Don


More information about the Haskell-Cafe mailing list