[Haskell-cafe] "sum" in hmatrix and blas?
Xiao-Yong Jin
xj2106 at columbia.edu
Sun Jun 8 19:40:55 EDT 2008
Alberto Ruiz <aruiz at um.es> writes:
> My experience is that Haskell allocation time is very fast and usually
> negligible in most non trivial matrix computations.
>
> A good thing about
>
> sum v = constant 1 (dim v) <.> v
>
> is that a constant vector is efficiently created internally (not from
> an intermediate Haskell list), and the inner product will be computed
> by the possibly optimized blas version available in your machine.
>
> You can also write simple definitions like the next one for the
> average of the rows of a matrix as a simple vector-matrix product:
>
> mean m = w <> m
> where w = constant k r
> r = rows m
> k = 1 / fromIntegral r
>
> I prefer high level "index free" matrix operations to C-like
> code. Definitions are clearer, have less bugs, and are also more
> efficient if you use optimized numerical libs.
>
> In any case, many algorithms can be solved by the recursive approach
> described by Tomas.
After reading don's blog, I tried to make a test with both
methods. The code is very simple, as following
module Main where
import System
import Numeric.LinearAlgebra
vsum1 :: Vector Double -> Double
vsum1 v = constant 1 (dim v) <.> v
vsum2 :: Vector Double -> Double
vsum2 v = go (d - 1) 0
where
d = dim v
go :: Int -> Double -> Double
go 0 s = s + (v @> 0)
go !j !s = go (j - 1) (s + (v @> j))
mean :: Vector Double -> Double
mean v = vsum v / fromIntegral (dim v)
where vsum = vsum1
main :: IO ()
main = do
fn:nrow:ncol:_ <- getArgs
print . mean . flatten =<< fromFile fn (read nrow, read ncol)
Compile it with "-O2 -optc-O2", and run with a data set of
length 5000000. The results are
with "vsum1":
80,077,984 bytes allocated in the heap
2,208 bytes copied during GC (scavenged)
64 bytes copied during GC (not scavenged)
40,894,464 bytes maximum residency (2 sample(s))
%GC time 0.0% (0.0% elapsed)
Alloc rate 35,235,448 bytes per MUT second
./vsum1 huge 5000000 1 2.25s user 0.09s system 99% cpu 2.348 total
This is reasonable, exactly two copies of vector with size
of 40MB.
with "vsum2":
560,743,120 bytes allocated in the heap
19,160 bytes copied during GC (scavenged)
15,920 bytes copied during GC (not scavenged)
40,919,040 bytes maximum residency (2 sample(s))
%GC time 0.3% (0.3% elapsed)
Alloc rate 222,110,261 bytes per MUT second
./mean2 huge 5000000 1 2.53s user 0.06s system 99% cpu 2.598 total
This is strange. A lot of extra usage of heap? Probably
because '@>' is not efficient? So it looks like the
inner-product approach wins with a fairly margin.
--
c/* __o/*
<\ * (__
*/\ <
More information about the Haskell-Cafe
mailing list