newtypes and optimization
Scott Dillard
sedillard at gmail.com
Wed Dec 12 13:02:15 EST 2007
Hi,
I have a statically-sized-list library that I use for linear algebra stuff.
It's got a vector type something like this:
> data V a b = V a b
so that a 3D vector is
> type Vec3 a = V a (V a (V a ()))
and I have type classes for operations on these things, like so:
> class VZipWith a b c u v w | {-lots of fundeps-} where
> vzipWith :: (a -> b -> c) -> u -> v -> w
>
> instance VZipWith a b c (V a ()) (V b ()) (V c ()) where
> vzipWith f (V x ()) (V y ()) = V (f x y) ()
>
> instance
> VZipWith a b c (V a u) (V b v) (V c w)
> => VZipWith a b c (V a (V a u)) (V b (V b v)) (V c (V c w))
> where
> vzipWith f (V x u) (V y v) = V (f x y) (vzipWith f u v)
so that vector addition is
> vadd = vzipWith (+)
I put strictness annotations and INLINE pragmas all over the place, and GHC
does wonders with it. Using Storable instances something like the following,
> instance Storable a => Storable (V a ()) where
> sizeOf _ = sizeOf (undefined::a)
> peek p = peek (castPtr p) >>= \a -> return (V a ())
> --etc
>
> instance (Storable a, Storable v) => Storable (V a v) where
> sizeOf _ = sizeOf (undefined::a) + sizeOf (undefined::v)
> peek p =
> a <- peek (castPtr p)
> v <- peek (castPtr (p`plusPtr`sizeOf(undefined::a)))
> return (V a v)
GHC can turn a loop like this,
> forM_ [0..n] $ \i ->
> do a <- peekElemOff aptr i
> b <- peekElemOff bptr i
> pokeElemOff cptr i (vadd a b)
into something as fast as C, using no heap. You look at the core and its
nothing but readDoubleOffAddr#, +## and the like. I went so far as to generalize
this to matrices with things like vector-matrix and matrix-matrix
multiplication, determinants and all that and, when used in loops like above,
it's consistently as fast or even faster than C.
However when I do this:
> newtype Quaternion = Q (Vec4 Double)
Everything is ruined. Functions like peek and vadd are no longer inlined,
intermediate linked lists are created all over the place. The Quaternion
Storable instance looks like this
> instance Storable s => Storable (Quaternion s) where
> sizeOf _ = 4*sizeOf (undefined::s)
> peek p = peek (castPtr p :: Ptr (Vec4 s)) >>= \v -> return (Q v)
with strictness annotations and INLINEs for everything. I also tried automatic
newtype deriving, with no luck. Why does a newtype defeat so much of the
optimization?
Thanks,
Scott
More information about the Glasgow-haskell-users
mailing list