haskell array access
Simon Peyton-Jones
simonpj@microsoft.com
Fri, 27 Jun 2003 09:36:36 +0100
Several comments
1. In Haskell "mod" is a lot more expensive than "rem", because it
involves careful jiggery pokery with the sign of the result. That's
why your boxed version was slower (nothing to do with boxing).
2. GHC does indeed optimise away the array access if you aren't
careful, and does in Derek's code. Below is a version that avoids doing
so, in the same way as the C version, by returning one of the values.
3. Derek correctly points out that gcc does not do array-bound checks.
GHC should eliminate them, but it's a non-trivial analysis and GHC does
not currently attempt it. You can always use unsafeAt, as Derek does.
4. The UArray stuff builds an array of unboxed Ints, like C. If you
use ordinary boxed arrays, the loop has to unbox the Int every time
around the loop, which makes it slower (2.6s)
With that done, we get
ghc -O2 -fliberate-case-threshold20 1.8s
gcc -O2 1.4s
Seems close enough to me; have not investigated further.
The "liberate-case" thing is a practically-undocumented GHC optimisation
that switches on with -O2. It specialises a loop that unboxes a free
variable (here the array k) so that the unboxing doesn't happen each
time round the loop. The threshold at which GHC thinks there is too
much code dup is set very low by default, so the flag increases it.
Simon
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
k :: UArray Int Int
k =3D array (0,15) [(i, i+1) | i <- [0 .. 15] ]
acc :: Int -> Int -> Int
acc r 0 =3D r
acc r n =3D r `seq` acc (k `unsafeAt` (n `rem` 16)) (n - 1)
main :: IO ()
main =3D print (acc 0 100000000)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
| Anyways,
| int main(void){
| int i, k;
| int a [16];
| for (i=3D0; i<=3D100000000; i++) {
| k =3D a [i % 16];
| }
| printf("%d\n",k);
| return 0;
| }
| compiled with MinGW 3.2 -O/-O2/-O3 runs in 1.8s using time.
|=20
| int main(void){
| int i, k;
| int a [16];
| for (i=3D0; i<=3D100000000; i++) {
| k =3D a [i % 16];
| }
| printf("%d\n",k);
| return 0;
| }
|=20
| The following (compiled with -O2 -fglasgow-exts and GHC 6.1 i.e. a CVS
| version) beats the C version taking only .8s. They don't do the same
| thing (the Haskell version "does" more, but prints something
different).
| Looking at the core/stg/c/asm (the asm and C was quite messy compared
| with other times), it certainly doesn't appear to be optimizing this
to
| print 0, and it does appear (though I'm less sure here) to be doing
| everything as it should, but it's really hard to read the assembly,
| since we don't use the value we lookup I can easily see gcc dropping
it
| altogether, though the C GHC spits out is pretty unusual for gcc, so
it
| may not.
|=20
| module Main (main) where
| import Data.Array.Unboxed (UArray, array)
| import Data.Array.Base (unsafeAt) -- if C doesn't, why should we?
| import GHC.Exts
|=20
| k :: UArray Int Int
| k =3D array (0,15) [(i, i+1) | i <- [0 .. 15] ]
|=20
| acc :: Int# -> Int#
| acc 0# =3D 0#
| acc n =3D (k `unsafeAt` (I# (n `remInt#` 16#))) `seq` acc (n -# 1#)
|=20
| main :: IO ()
| main =3D print (I# (acc 100000000#))
|=20
| However, I'm am somewhat surprised that I seemingly had to unbox by
hand
| to get quite close (in fact, better) than C. I expected the following
| version to be comparable to the C version (and it is orders of
magnitude
| faster than the original version which was over 5min when I killed it
| (though I think I compiled it with just -O), this version taking 14s).
|=20
| module Main (main) where
| import Data.Array.Unboxed (UArray, array)
| import Data.Array.Base (unsafeAt) -- if C doesn't, why should we?
|=20
| k :: UArray Int Int
| k =3D array (0,15) [(i, i+1) | i <- [0 .. 15] ]
|=20
| acc :: Int -> Int
| acc 0 =3D 0
| acc n =3D (k `unsafeAt` (n `mod` 16)) `seq` acc (n - 1)
|=20
| main :: IO ()
| main =3D print (acc 100000000)
|=20
| _______________________________________________
| Haskell-Cafe mailing list
| Haskell-Cafe@haskell.org
| http://www.haskell.org/mailman/listinfo/haskell-cafe