[Haskell-cafe] Can Haskell outperform C++?

Mon May 14 16:26:35 CEST 2012

>
> Well, if it's "in many ways the same as C", then again it's probably not
> idiomatic Haskell.

It's just a recursive equation for mandelbrot fractals.  I should have been
precise, I didn't mean that the source is literally the *same* as the C
source (i.e. there's no for loop, no mutable variables), rather that it
presents the compiler backend with the same advantages as the C backend
would have with the equivalent loop.  That is, there's no control flow
obfuscation (dictionaries, calls to unknown targets), no problems with
laziness, and the data representations are completely known.

mandel :: Int -> Complex Double -> Int
mandel max_depth c = loop 0 0

  where

   loop i !z

    | i == max_depth      = i

    | magnitude z >= 2.0  = i

    | otherwise           = loop (i+1) (z*z + c)

It's also a loop that is readily recognized as a loop.  Now, to my
knowledge, GHC doesn't explicitly recognize loops in any stage of the
compiler (so as to perform loop optimizations), something which other
functional compilers sometimes do.

But, anyway, it turns out that my example above *is easily transformed from
a bad GHC performance story into a good one*.  If you'll bear with me, I'll
show how below.
   First, Manuel makes a good point about the LLVM backend.  My "6X"
anecdote was from a while ago and I didn't use llvm [1].  I redid it just
now with 7.4.1+LLVM, results below.  (The below table should read correctly
in fixed width font, but you can also see the data in the spreadsheet
here<https://docs.google.com/spreadsheet/ccc?key=0AvzAHqQmHo87dHU0T0lCb1I4MFJmM2s4RnNlamJlNkE>
.)

                   Time (ms)   Compiled File size   Comple+Runtime (ms)
GHC 7.4.1 O0    2444        1241K
GHC 7.4.1 O2    925        1132K             1561
GHC 7.4.1 O2 llvm  931         1133K
GHC 7.0.4 O2 via-C 684         974K

So LLVM didn't help [1].  And in fact the deprecated via-C backend did the
best!  Compare with GCC:

G++ O0             300         9K                   531
G++ O3             110         7K                   347
G++ O3 recursive   116         9K

Uh oh, the "6X" gap I mentioned is now closer to 9X.  And, in fact,
performing a mini "language shootout" on the above code, reveals that GHC
is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic
type checks, and a necessarily boxed representation of complex numbers:

Chez Scheme 8.4    284         2.7K notStandalone   372
OCaml              166         180K                 301

At least Python does worse!

Python 2.6         1973        NA                   1973

*So here's the catch:*  If you check the Core and STG GHC 7.4 is actually
compiling the above loop very well.  This microbenchmark turns into just a
"magnitude" microbenchmark.  The implementation of Data.Complex uses an
EXTREMELY expensive method to avoid
overflow<https://github.com/ghc/packages-base/blob/master/Data/Complex.hs#L115>
 [2].

Since OCaml did well above, I took a look at their standard library's
implementation, on line 51
here<http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/trunk/stdlib/complex.ml?revision=11156&view=markup>.
 They use a nice little math trick (the extra division) that is also
mentioned on Wikipedia.  By implementing the same trick in Haskell,
replacing the standard "magnitude"
function<https://github.com/rrnewton/MandelMicrobench/blob/97c3275ad94cbce57a688817332b42f7c32c15b4/mandel_test2.hs>,
we get the following results.

GHC 7.4.1 No
Overflow Avoidance   39         1127K                674
GHC 741, OCaml-style
Overflow avoidance   74                  1127K

Wow, now not only is the Haskell version faster than OCaml/Scheme, *but it
is 48% faster than C++*, which is appropriate to the title of this email!
 Of course, this probably means that C++'s abs is also doing something
overly expensive for overflow avoidance (or that their representation of
complex numbers is not fully unpacked and stored in registers)  I haven't
tracked it down yet.

But in any case, I'll be submitting a library patch.  *The moral, I think,
is that community members could do a great deal to help "Haskell"
performance by simply microbenchmarking and optimizing library routines in
base!*

Cheers,
  -Ryan

P.S. You can check out the above benchmarks from here:
 https://github.com/rrnewton/MandelMicrobench<https://github.com/rrnewton/MandelMicrobench>

[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David
Peixotto's findings that LLVM optimizations are not very effective on
Haskell-generated LLVM compared with typical clang-generated LLVM.

[2]  P.P.P.S. It turns out there was already a ticket (
http://hackage.haskell.org/trac/ghc/ticket/2450) regarding magnitude's
performance.  But it still has bad performance even after a small
refactoring was performed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120514/92018fd3/attachment.htm>