[Haskell-cafe] Re: Very fast loops. Now!
Donald Bruce Stewart
dons at cse.unsw.edu.au
Mon Feb 12 16:18:25 EST 2007
ewilligers:
> Eric Willigers wrote:
> >Do the two programs implement the same algorithm? The C program updates
> >x and y in sequence. The Haskell program updates x and y in parallel and
> >can be easier for the compiler to optimize.
>
>
> Hi Don,
>
> Expressing this in other words, do we want the new y to be based on the
> new x or on the old x?
>
> I extracted the code you posted to CS.c and HP.hs (C sequential, Hashell
> parallel).
>
> I made the following minor changes to form CP.c and HS.hs (C parallel,
> Hashell sequential):-
>
> double xn;
> for (; i<=1000000000; i++) {
> xn = x*y/3.0;
> y = x*9.0;
> x = xn;
> }
>
> | otherwise = go xs (xs*9) (i+1)
> where xs = x*y/3
>
>
> Tested on a 2.8 GHz Pentium 4, running XP SP2 and cygwin, using the
> compiler options from your post. Each program was run once.
>
> $ uname -a
> CYGWIN_NT-5.1 nemo 1.5.21(0.156/4/2) 2006-07-30 14:21 i686 Cygwin
>
> $ gcc --version
> gcc (GCC) 3.4.4 (cygming special) (gdc 0.12, using dmd 0.125)
>
> $ ghc --version
> The Glorious Glasgow Haskell Compilation System, version 6.6
>
>
> $ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 CP.c -o CP
>
> $ time ./CP
> 3.333333
>
> real 0m10.560s
> user 0m10.546s
> sys 0m0.015s
>
>
> $ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 CS.c -o CS
>
> $ time ./CS
> 3.333333
>
> real 0m10.788s
> user 0m10.718s
> sys 0m0.015s
>
>
> $ ghc -O -fexcess-precision -fbang-patterns -optc-O3 -optc-ffast-math
> -optc-mfpmath=sse -optc-msse2 HP.hs -o HP
>
> $ time ./HP
> 3.3333333333333335
>
> real 1m8.550s
> user 0m0.015s
> sys 0m0.031s
>
>
> $ ghc -O -fexcess-precision -fbang-patterns -optc-O3 -optc-ffast-math
> -optc-mfpmath=sse -optc-msse2 HS.hs -o HS
>
> $ time ./HS
> 3.3333333333333335
>
> real 1m9.425s
> user 0m0.015s
> sys 0m0.046s
-fexcess-precision *must* be provided as a pragma,
{-# OPTIONS -fexcess-precision #-}
import Text.Printf
main = go (1/3) 3 1
go :: Double -> Double -> Int -> IO ()
go !x !y !i
| i == 1000000000 = printf "%.6f\n" (x+y)
| otherwise = go xn (xn*9) (i+1)
where xn = x*y/3
C source:
#include <stdio.h>
int main()
{
double x = 1.0/3.0;
double y = 3.0;
int i = 1;
double xn;
for (; i<1000000000; i++) {
xn = x*y/3.0;
y = x*9.0;
x = xn;
}
printf("%f\n", x+y);
return 0;
}
------------------------------------------------------------------------
$ ghc -O2 -optc-O3 -optc-ffast-math -optc-mfpmath=sse -optc-msse2 -fbang-patterns HP.hs -o hp
$ time ./hp
3.333333
./hp 17.59s user 0.01s system 99% cpu 17.646 total
$ gcc -O3 -ffast-math -mfpmath=sse -msse2 -std=c99 t.c -o cc
$ time ./cc
3.333333
./cc 8.70s user 0.00s system 98% cpu 8.837 total
------------------------------------------------------------------------
Now, if we rewrite it to not use the temporary:
go :: Double -> Double -> Int -> IO ()
go !x !y !i
| i == 1000000000 = printf "%.6f\n" (x+y)
| otherwise = go (x*y/3) (x*9) (i+1)
for (; i<1000000000; i++) {
x = x*y/3.0;
y = x*9.0;
}
------------------------------------------------------------------------
$ time ./hp
3.333333
./hp 9.95s user 0.00s system 99% cpu 9.965 total
$ time ./cc
3.333333
./cc 10.06s user 0.00s system 99% cpu 10.110 total
------------------------------------------------------------------------
$ gcc --version
gcc (GCC) 3.3.5 (propolice)
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.6
$ dmesg| grep cpu0 | head
cpu0: Intel(R) Pentium(R) M processor 1600MHz ("GenuineIntel" 686-class) 1.60 GHz
-- Don
More information about the Haskell-Cafe
mailing list