[Haskell-cafe] fast Eucl. dist. - Haskell vs C

Patrick Perry patperry at stanford.edu
Tue May 19 18:02:53 EDT 2009


Hi Kenneth,

I wrote a benchmark similar to yours using the haskell blas library I  
wrote (latest version is on github at http://github.com/patperry/blas/tree/master 
, an older version is on hackage).

The pure Haskell implementation is not very good, and the problem  
seems to be repeated boxing/unboxing of doubles.  It may be possible  
to avoid this by writing a custom "fold" function for Vectors, but I  
haven't tried it.  Using a vectorized subtraction function along with  
the BLAS1 "dnrm2" function, I was able to get performance about  
2.8-5.4x worse than C.  This isn't great, but it's not terrible,  
either.  I didn't have the performance problems you ran in to using  
the FFI.  My benchmark includes:

distVectors1: a native implementation using foldl' and zipWith
distVectors2: a vectorize implementation, defined as @distVectors2 x y  
= norm2 (x-y)@
distVectors3: a custom loop in haskell
distVectors4: an FFI call to C

All 4 functions use the "Vector" type, provided by the "blas" package.

I compiled with "-O -fexcess-preceision" and ran timings for vectors  
of size 10, 100, and 1000.  Here are the results:

Macintosh-62:~/Code/blas/examples (euclidean)$ ./Euclidean 10
n: 10
* distVectors1: .....................
   1.366ns per iteration / 732029.06 per second.
* distVectors2: .....................
   1.286ns per iteration / 777734.02 per second.
* distVectors3: ......................
   0.950ns per iteration / 1052847.93 per second.
* distVectors4: .......................
   0.452ns per iteration / 2213752.56 per second.
Macintosh-62:~/Code/blas/examples (euclidean)$ ./Euclidean 100
n: 100
* distVectors1: ..................
   12.496ns per iteration / 80025.79 per second.
* distVectors2: ....................
   2.149ns per iteration / 465311.61 per second.
* distVectors3: ..................
   8.728ns per iteration / 114572.53 per second.
* distVectors4: ......................
   0.597ns per iteration / 1675765.65 per second.
Macintosh-62:~/Code/blas/examples (euclidean)$ ./Euclidean 1000
n: 1000
* distVectors1: ..............
   125.920ns per iteration / 7941.52 per second.
* distVectors2: ..................
   10.677ns per iteration / 93661.99 per second.
* distVectors3: ...............
   85.533ns per iteration / 11691.42 per second.
* distVectors4: ....................
   1.962ns per iteration / 509668.17 per second.

The "distVectors2" function is computing the norm in a more  
numerically-stable way.  It is also allocating a temporary array to  
store x-y.

I haven't looked over your benchmarking code too thoroughly, but parts  
of it seem suspect to me.  Namely, you are using "mapM" instead of  
"mapM_".  Also, it seems like you are including GC times in your  
benchmarks.

Attached is my code.  You will need the latest "blas" from github to  
compile it.  You also might want to look into hmatrix.  I have very  
little time for haskell code right now (I'm trying to finish my Ph.D.  
in the next month).  This summer I plan to work more on this stuff,  
and I'll probably make a release in July.


Patrick

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Euclidean.hs
Type: application/octet-stream
Size: 1938 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090519/f17de0c6/Euclidean.obj
-------------- next part --------------


-------------- next part --------------
A non-text attachment was scrubbed...
Name: dist.c
Type: application/octet-stream
Size: 248 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20090519/f17de0c6/dist.obj
-------------- next part --------------



More information about the Haskell-Cafe mailing list