[Haskell] Matrix multiplication
Tillmann Vogt
Tillmann.Vogt at rwth-aachen.de
Sat May 3 14:57:04 EDT 2008
Hi,
Thanks for all your nice replies. I did this matrix-multiplication
experiment for a seminar on multithreading where I have to give a talk
on Unified Parallel C. At first I thought I should not mention haskell
as an alternative because of the speed. But now I might do some slides
about the advantages/(disadvantages?) of side-effekt free languages,
maybe ndp. In my opinion these C extension are not a nice solution.
Unified Parallel C parallelizes only for-loops and distributes the
workload by uniformly cutting an array in pieces and then setting an
"affinity" so that a CPU works on that data. The trick they are really
proud of is that the compiler knows in this way where to put the data in
a NUMA-system (Non-Uniform Memory Architecture). I am not really sure if
this language extension can cope with programs where pieces need
considerably different calulation times.
I forgot to mention that I used ghc 6.8.2 and sorry for that stupid
example (a had to take something that fits on a presentation-slide).
Cheers, Tillmann
Don Stewart schrieb:
> The other thing here is that he's using unboxed, nested arrays in C,
> while using naive lists in Haskell.
>
> To actually compare them, you'd need to use nested STUArrays.
>
> Hopefully we'll have a library for these soon (as a result of the ndp
> lib). Otherwise, use on one of the matrix libraries (hmatrix/
> gslhaskell)
>
> For non-nested arrays, we can do rather well with:
>
> import Data.Array.Vector
>
> n :: Int
> n = 4000
>
> main = print (sumU (zipWithU (*) a b))
> where
> a = replicateU n (2::Double)
> b = mapU realToFrac $ enumFromToU 0 (n-1)
>
> Which compiles to some nicely fused unboxed loops.
>
> The trick is to get this working with nested arrays.
> The ndp library looks like our best bet here:
>
> darcs.haskell.org/packages/ndp
>
>
> -- Don
>
> tim:
>> On Thu, 24 Apr 2008 04:01:50 Tillmann Vogt wrote:
>>> Hi,
>>>
>>> I am currently experimenting with parallelizing C-programs. I have
>>> therefore written a matrix vector multiplication example that needs 13
>>> seconds to run (5 seconds with OpenMP). Because I like Haskell I did the
>>> same in this language, but it takes about 134 seconds. Why is it so
>>> slow? Does someone have an idea?
>>>
>>>
>>> module Main where
>>>
>>> main = do putStrLn (show (stupid_mul 100))
>>> putStrLn "100 multiplications done"
>>>
>>> stupid_mul 0 = []
>>> stupid_mul it = (s_mul it) : stupid_mul (it-1) -- without "it" after
>>> s_mul only one multiplication is executed
>>>
>>> s_mul it = mul (replicate 4000 [0..3999]) (replicate 4000 2)
>>>
>>> mul :: [[Double]] -> [Double] -> [Double]
>>> mul [] _ = []
>>> mul (b:bs) c | sp==0 = sp : (mul bs c) -- always false, force evaluation
>>>
>>> | otherwise = (mul bs c)
>>>
>>> where sp = (scalar b c)
>>>
>>> scalar :: [Double] -> [Double] -> Double
>>> scalar _ [] = 0
>>> scalar [] _ = 0
>>> scalar (v:vs) (w:ws) = (v*w) + (skalar vs ws)
>>>
>>>
>>>
>>> and here the C-program
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <time.h>
>>>
>>> #define M 4000
>>> #define N 4000
>>> #define IT 100
>>>
>>> double a[M], b[M][N], c[N];
>>>
>>> int main(int argc, char *argv[])
>>> {
>>> double d;
>>> int i, j, l;
>>> time_t start,end;
>>>
>>> printf("Initializing matrix B and vector C\n");
>>> for(j=0; j<N; j++) c[j] = 2.0;
>>> for(i=0; i<M; i++) for(j=0; j<N; j++) b[i][j] = j;
>>>
>>> printf("Executing %d matrix mult. for M = %d N = %d\n",IT,M,N);
>>> time (&start);
>>>
>>> for(l=0; l<IT; l++)
>>>
>>> #pragma omp parallel for default(none) \
>>> shared(a,b,c) private(i,j,l)
>>>
>>> for(i=0; i<M; i++)
>>> {
>>> a[i] = 0.0;
>>> for (j=0; j<N; j++) a[i] += b[i][j]*c[j];
>>> }
>>> time (&end);
>>>
>>> d = difftime (end,start);
>>> printf ("calculation time: %.2lf seconds\n", d );
>>> return 0;
>>> }
>>> _______________________________________________
>>> Haskell mailing list
>>> Haskell at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell
>> You haven't even told us which compilers you're using so it's pretty difficult
>> to do. I can't even get your code to compile - there are typos in it so
>> you've obviously altered it since compiling yourself.
>>
>> While this program may be wrong, I've dashed off this attempt that takes about
>> 1.3 seconds on my not-too-powerful machine:
>>
>> module Main
>> where
>>
>> import Control.Monad
>>
>> rows = 4000
>> cols = 4000
>> iterations = 100
>>
>> main = do
>> let
>> vector :: [Double]
>> vector = replicate cols 2.0
>> matrix :: [[Double]]
>> matrix = replicate rows (map fromIntegral [0..cols-1])
>> a = map (sum . (zipWith (*) vector)) matrix
>> replicateM_ iterations (putStrLn (show a))
>>
>> Those who understand how Haskell programs are executed will now be
>> screaming "cheating!". This program when optimised by GHC (which I use) will
>> only actually do the calculation once and print it 100 times. That is, after
>> all, the same output you asked for. It may even be taking more shortcuts
>> using identities around map and replicate but I'm not sure.
>>
>> When mapping imperative languages to functional ones a little understanding of
>> how it is executed goes a long way. Performance of your programs will benefit
>> immensely if you know how your program will be run. I'm new to Haskell but
>> have already realised that performance can be altered by orders of magnitude
>> by making possible optimisations more visible to the compiler with how things
>> are set out.
>>
>> P.S. I would really recommend increasing use of higher level functions such as
>> map. They make code much more readable and the most common also receive
>> special optimisations from many compilers.
>>
>> Cheers,
>>
>> Tim
>> _______________________________________________
>> Haskell mailing list
>> Haskell at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell
More information about the Haskell
mailing list