[Haskell-cafe] Haskell performance question

Thu Nov 8 17:44:34 EST 2007

xj2106:
> Don Stewart <dons at galois.com> writes:
> 
> > Can you start by retrying with flags from the spectral-norm benchmark:
> >
> >     http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=ghc&id=0
> >
> > The interaction with gcc here is quite important, so forcing -fvia-C
> > will matter.
> 
> Clearly things has been changed, since the release of ghc-6.8.1.  I tried them
> with my laptop, and here are the results of N=3000.
> 
> 
> C++ g++
> =======
> 
> real    0m4.553s
> user    0m4.551s
> sys     0m0.002s
> 
> changed one option: -march=nocona
> 
> 
> Haskell GHC
> ===========
> 
> real 0m34.392s
> user 0m34.316s
> sys 0m0.074s
> 
> I used `unsafePerformIO' with `INLINE', because I don't know
> where `inlinePerformIO' is now.  And also the `-optc-march'
> is changed to `nocona'.

Using unsafePerformIO here would break some crucial inlining.
(the same trick is used in Data.ByteString, by the way).

You can find inlinePerformIO is in Data.ByteString.Internal.

Comparing the two, n=5500, ghc 6.8:

    $ ghc -O -fglasgow-exts -fbang-patterns -optc-O3
    -optc-march=pentium4 -optc-mfpmath=sse -optc-msse2  -optc-ffast-math
    spec.hs -o spec_hs --make

With inlinePerformIO:

    $ time ./spec_hs 5500
    1.274224153
    ./spec_hs 5500  26.32s user 0.00s system 99% cpu 26.406 total

As expected, and comparable to the shooutout result for the same N.
With unsafePerformIO, the whole thing falls apart:

    $ time ./spec_hs 5500
    ^Cspec_hs: interrupted
    ./spec_hs 5500  124.86s user 0.11s system 99% cpu 2:05.04 total

I gave up after 2 minutes. This FFI peek/poke code, acting as an ST
monad, under a pure interface relies on inlinePerformIO.

And the C++ program, just for comparison:

    $ g++ -c -pipe -O3 -fomit-frame-pointer -march=pentium4  -mfpmath=sse
    -msse2 spec.c 
    $ g++ spec.o -o spec-cpp

    $ time ./spec-cpp 5500
    1.274224153      
    ./spec-cpp 5500  18.81s user 0.00s system 99% cpu 18.816 total

So we remain competitive after changing to 6.8.

Again, low level array code optimised is within 2x optimised C/C++.

-- Don