[Haskell-cafe] blas bindings, why are they so much slower the C?
Anatoly Yakovenko
aeyakovenko at gmail.com
Sat Jun 28 01:29:22 EDT 2008
i get the same crappy performance with:
$ cat htestdot.hs
{-# OPTIONS_GHC -O2 -fexcess-precision -funbox-strict-fields
-fglasgow-exts -fbang-patterns -lcblas#-}
module Main where
import Data.Vector.Dense.IO
import Control.Monad
main = do
let size = 10
let times = 10*1000*1000
v1::IOVector Int Double <- newListVector size $ replicate size 0.1
v2::IOVector Int Double <- newListVector size $ replicate size 0.1
replicateM_ times $ v1 `getDot` v2
On Fri, Jun 27, 2008 at 7:41 PM, Dan Doel <dan.doel at gmail.com> wrote:
> On Friday 27 June 2008, Anatoly Yakovenko wrote:
>> $ cat htestdot.hs
>> {-# OPTIONS_GHC -O2 -fexcess-precision -funbox-strict-fields
>> -fglasgow-exts -fbang-patterns -lcblas#-}
>> module Main where
>>
>> import Data.Vector.Dense.IO
>> import Control.Monad
>>
>> main = do
>> let size = 10
>> let times = 10*1000*1000
>> v1::IOVector Int Double <- newListVector size $ replicate size 0.1
>> v2::IOVector Int Double <- newListVector size $ replicate size 0.1
>> sum <- foldM (\ ii zz -> do
>> rv <- v1 `getDot` v2
>> return $ zz + rv
>> ) 0.0 [0..times]
>> print $ sum
>
> Hackage is down for the time being, so I can't install blas and look at the
> core for your program. However, there are still some reasons why this code
> would be slow.
>
> For instance, a brief experiment seems to indicate that foldM is not a good
> consumer in the foldr/build sense, so no deforestation occurs. Your program
> is iterating over a 10-million element lazy list. That's going to add
> overhead. I wrote a simple test program which just adds 0.1 in each
> iteration:
>
> ---- snip ----
>
> {-# LANGUAGE BangPatterns #-}
>
> module Main (main) where
>
> import Control.Monad
>
> main = do
> let times = 10*1000*1000
> sum <- foldM (\_ zz -> return $ zz + 0.1) 0.0 [0..times]
> -- sum <- foo 0 times 0.0
> print $ sum
>
> foo :: Int -> Int -> Double -> IO Double
> foo k m !zz
> | k <= m = foo (k+1) m (zz + 0.1)
> | otherwise = return zz
>
> ---- snip ----
>
> With foldM, it takes 2.5 seconds on my machine. If you comment that line, and
> use foo instead, it takes around .1 seconds. So that's a factor of what, 250?
> That loop allows for a lot more unboxing, which allows much better code to be
> generated.
>
> When Hackage comes back online, I'll take a look at your code, and see if I
> can make it run faster, but you might want to try it yourself in the time
> being. Strictifying the addition of the accumulator is probably a good idea,
> for instance.
>
> Cheers,
> -- Dan
>
More information about the Haskell-Cafe
mailing list