[Haskell-cafe] Haskell performance question
Don Stewart
dons at galois.com
Thu Nov 8 17:44:34 EST 2007
xj2106:
> Don Stewart <dons at galois.com> writes:
>
> > Can you start by retrying with flags from the spectral-norm benchmark:
> >
> > http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0
> >
> > The interaction with gcc here is quite important, so forcing -fvia-C
> > will matter.
>
> Clearly things has been changed, since the release of ghc-6.8.1. I tried them
> with my laptop, and here are the results of N=3000.
>
>
> C++ g++
> =======
>
> real 0m4.553s
> user 0m4.551s
> sys 0m0.002s
>
> changed one option: -march=nocona
>
>
> Haskell GHC
> ===========
>
> real 0m34.392s
> user 0m34.316s
> sys 0m0.074s
>
> I used `unsafePerformIO' with `INLINE', because I don't know
> where `inlinePerformIO' is now. And also the `-optc-march'
> is changed to `nocona'.
Using unsafePerformIO here would break some crucial inlining.
(the same trick is used in Data.ByteString, by the way).
You can find inlinePerformIO is in Data.ByteString.Internal.
Comparing the two, n=5500, ghc 6.8:
$ ghc -O -fglasgow-exts -fbang-patterns -optc-O3
-optc-march=pentium4 -optc-mfpmath=sse -optc-msse2 -optc-ffast-math
spec.hs -o spec_hs --make
With inlinePerformIO:
$ time ./spec_hs 5500
1.274224153
./spec_hs 5500 26.32s user 0.00s system 99% cpu 26.406 total
As expected, and comparable to the shooutout result for the same N.
With unsafePerformIO, the whole thing falls apart:
$ time ./spec_hs 5500
^Cspec_hs: interrupted
./spec_hs 5500 124.86s user 0.11s system 99% cpu 2:05.04 total
I gave up after 2 minutes. This FFI peek/poke code, acting as an ST
monad, under a pure interface relies on inlinePerformIO.
And the C++ program, just for comparison:
$ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4 -mfpmath=sse
-msse2 spec.c
$ g++ spec.o -o spec-cpp
$ time ./spec-cpp 5500
1.274224153
./spec-cpp 5500 18.81s user 0.00s system 99% cpu 18.816 total
So we remain competitive after changing to 6.8.
Again, low level array code optimised is within 2x optimised C/C++.
-- Don
More information about the Haskell-Cafe
mailing list