[Haskell-cafe] blas bindings, why are they so much slower the C?

Don Stewart dons at galois.com
Sat Jun 28 01:51:09 EDT 2008


aeyakovenko:
> i get the same crappy performance with:
> 
> $ cat htestdot.hs
> {-# OPTIONS_GHC -O2 -fexcess-precision -funbox-strict-fields
> -fglasgow-exts -fbang-patterns -lcblas#-}
> module Main where
> 
> import Data.Vector.Dense.IO
> import Control.Monad
> 
> main = do
>    let size = 10
>    let times = 10*1000*1000
>    v1::IOVector Int Double <- newListVector size $ replicate size 0.1
>    v2::IOVector Int Double <- newListVector size $ replicate size 0.1
>    replicateM_ times $ v1 `getDot` v2

replicateM_ is using a list underneath for control as well,

    replicateM n x    = sequence (replicate n x)

Try writing a simple recursive loop, as Dan suggested. No list node
forcing overhead, so in a very tight loop you'll just want the index in
a register.

See here for more examples of tight register loops,

    http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast

In general, if you're chasing C performance for a loop, your best bet is
to write a loop first. Then later see if you can get the same kind of
code from higher order, lazy, monadic functions.

-- Don


More information about the Haskell-Cafe mailing list