[Haskell-cafe] blas bindings, why are they so much slower the C?
Don Stewart
dons at galois.com
Sat Jun 28 01:51:09 EDT 2008
aeyakovenko:
> i get the same crappy performance with:
>
> $ cat htestdot.hs
> {-# OPTIONS_GHC -O2 -fexcess-precision -funbox-strict-fields
> -fglasgow-exts -fbang-patterns -lcblas#-}
> module Main where
>
> import Data.Vector.Dense.IO
> import Control.Monad
>
> main = do
> let size = 10
> let times = 10*1000*1000
> v1::IOVector Int Double <- newListVector size $ replicate size 0.1
> v2::IOVector Int Double <- newListVector size $ replicate size 0.1
> replicateM_ times $ v1 `getDot` v2
replicateM_ is using a list underneath for control as well,
replicateM n x = sequence (replicate n x)
Try writing a simple recursive loop, as Dan suggested. No list node
forcing overhead, so in a very tight loop you'll just want the index in
a register.
See here for more examples of tight register loops,
http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast
In general, if you're chasing C performance for a loop, your best bet is
to write a loop first. Then later see if you can get the same kind of
code from higher order, lazy, monadic functions.
-- Don
More information about the Haskell-Cafe
mailing list