[Haskell-cafe] Re: [Haskell] Matrix multiplication
Don Stewart
dons at galois.com
Wed Apr 30 02:04:00 EDT 2008
> droundy
> On Wed, Apr 23, 2008 at 05:24:20PM +0100, Sebastian Sylvan wrote:
> > On Wed, Apr 23, 2008 at 5:01 PM, Tillmann Vogt <Tillmann.Vogt <at> rwth-aachen.de>
> > wrote:
> >
> > > Hi,
> > >
> > > I am currently experimenting with parallelizing C-programs. I have
> > > therefore written a matrix vector multiplication example that needs 13
> > > seconds to run (5 seconds with OpenMP). Because I like Haskell I did the
> > > same in this language, but it takes about 134 seconds. Why is it so slow?
> > > Does someone have an idea?
> > >
> > Yes, in the C version you use unboxed arrays, in the Haskell version you use
> > a linked list of linked lists. Naturally the latter will take up more space,
> > require more work to index, and will thrash the cache quite a bit.
>
> In fact, I'm impressed that Haskell can come within a factor of 10, given
> that it's using such a challenging data type! (I wonder if this may be due
> to inefficiencies in the C code, although I haven't looked at it.)
I had a look at this code, using the (unboxed, strict, fused) data parallel
arrays library (http://darcs.haskell.org/packages/ndp).
We get rather nice code. The original C program (modified, to sum the result,
rather than just filling the array and throwing it out):
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define M 4000
#define N 4000
#define IT 100
double a[M], b[M][N], c[N];
int main(int argc, char *argv[])
{
double d;
double sum;
int i, j, l;
time_t start,end;
printf("Initializing matrix B and vector C\n");
for(j=0; j<N; j++) c[j] = 2.0;
for(i=0; i<M; i++)
for(j=0; j<N; j++)
b[i][j] = j;
printf("Executing %d matrix mult. for M = %d N = %d\n",IT,M,N);
time (&start);
sum = 0;
for(i=0; i<M; i++)
for (j=0; j<N; j++)
sum += b[i][j]*c[j];
time (&end);
printf ("result: %.2lf\n", sum );
d = difftime (end,start);
printf ("calculation time: %.2lf seconds\n", d );
return 0;
}
And then rewritten in a data parallel style (but note, no parallelism here, just using
the nice interface to STUArrays, with stream fusion):
import Data.Array.Parallel.Unlifted
import Text.Printf
m = 4000
n = 4000
it = 100
main = printf "result: %2f\n" . sumU . zipWithU (*) (concatSU b) $ repeatU m c
where
-- c[N]
c = replicateU n (2.0 :: Double)
-- b[M][N]
b = replicateCU m t where t = mapU fromIntegral (enumFromToU 0 (n-1)) :: UArr Double
Finds 4 fusion sites, and runs in:
$ time ./a
result: 63984000000.0
./a 0.10s user 0.00s system 112% cpu 0.089 total
While the C program,
$ time ./a.out
Initializing matrix B and vector C
Executing 100 matrix mult. for M = 4000 N = 4000
result: 63984000000.00
calculation time: 0.00 seconds
./a.out 0.14s user 0.09s system 92% cpu 0.248 total
GHC rocks :) NDP programming encourages a nice combinator style too,
which is interesting (and Haskell-ish).
The compiler is able to fuse array most of the arrays, (and Roman reports this
should do rather better with ghc head).
Anyway, we've a few new (fast!) array options on the horizon, looking promising.
For more info, check out the data parallel arrays page,
http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell
-- Don
More information about the Haskell-Cafe
mailing list