haskell array access

Derek Elkins ddarius@hotpop.com
Thu, 26 Jun 2003 19:46:45 -0400


On Thu, 26 Jun 2003 17:48:17 -0400 (EDT)
Lex Stein <stein@eecs.harvard.edu> wrote:

> 
> Hi, I'm measuring Haskell, Ocaml, and C, small array access
> performance and the results with Haskell (GHC 5.02.2) are so bad that
> I think I must be doing something wrong. Indeed, I am hardly a
> seasoned Haskell programmer. My results are:
> 
> ghc -O      61.60s
> gcc -O3      0.20s
> Ocamlopt     1.11s (tail call)
> Ocamlopt     0.82s (for loop)
> 
> Why is the ghc generated code so slow? The source for all the tests is
> below. Any insight will be greatly appreciated (I'm running on Debian
> Linux w/ 1GHz x86 CPU and 768MB RAM).
> 
> The test cycles through an array of 16 elements, reading 100*10^6
> elements.
> 
> Thanks!
> Lex
> 
> Haskell (ghc5) test source:
> 
> import Array
> k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ]
> acc 0 = 0
> acc n = seq (k Array.! (n `mod` 16)) (acc (n-1))
> main = do
>         print (acc 100000000)
> 
> C (gcc) test source:
> 
> int main () {
>         int i, k;
>         int a [16];
>         for (i=0; i<=100000000; i++) {
>                 k = a [i % 16];
>         }
> }

well, to begin with, it's obviously already biased toward C.  The C
version doesn't initialize the array and doesn't display any output. 
Using MinGW gcc 3.2 I, like Sven, get a .S file that's just an empty
loop(why the loop was even
kept...?).

Before you make a benchmark like this you may want to look at at least
the basic parts of the following, which has Sven's advice among (many)
other things.
http://haskell.cs.yale.edu/ghc/docs/latest/html/users_guide/faster.html

Anyways,
int main(void){
    int i, k;
    int a [16];
    for (i=0; i<=100000000; i++) {
        k = a [i % 16];
    }
    printf("%d\n",k);
    return 0;
}
compiled with MinGW 3.2 -O/-O2/-O3 runs in 1.8s using time.

int main(void){
    int i, k;
    int a [16];
    for (i=0; i<=100000000; i++) {
        k = a [i % 16];
    }
    printf("%d\n",k);
    return 0;
}

The following (compiled with -O2 -fglasgow-exts and GHC 6.1 i.e. a CVS
version) beats the C version taking only .8s.  They don't do the same
thing (the Haskell version "does" more, but prints something different).
 Looking at the core/stg/c/asm (the asm and C was quite messy compared
with other times), it certainly doesn't appear to be optimizing this to
print 0, and it does appear (though I'm less sure here) to be doing
everything as it should, but it's really hard to read the assembly,
since we don't use the value we lookup I can easily see gcc dropping it
altogether, though the C GHC spits out is pretty unusual for gcc, so it
may not.

module Main (main) where
import Data.Array.Unboxed (UArray, array)
import Data.Array.Base (unsafeAt) -- if C doesn't, why should we?
import GHC.Exts

k :: UArray Int Int
k = array (0,15) [(i, i+1) | i <- [0 .. 15] ]

acc :: Int# -> Int#
acc 0# = 0#
acc n  = (k `unsafeAt` (I# (n `remInt#` 16#))) `seq` acc (n -# 1#)

main :: IO ()
main = print (I# (acc 100000000#))

However, I'm am somewhat surprised that I seemingly had to unbox by hand
to get quite close (in fact, better) than C. I expected the following
version to be comparable to the C version (and it is orders of magnitude
faster than the original version which was over 5min when I killed it
(though I think I compiled it with just -O), this version taking 14s).

module Main (main) where
import Data.Array.Unboxed (UArray, array)
import Data.Array.Base (unsafeAt) -- if C doesn't, why should we?

k :: UArray Int Int
k = array (0,15) [(i, i+1) | i <- [0 .. 15] ]

acc :: Int -> Int
acc 0 = 0
acc n  = (k `unsafeAt` (n `mod` 16)) `seq` acc (n - 1)

main :: IO ()
main = print (acc 100000000)