late optimization of typeclass callers
Christian Höner zu Siederdissen
choener at tbi.univie.ac.at
Thu Aug 26 05:36:04 EDT 2010
Hi,
I do have the problem that my code is not completely optimized. We begin
with
-- Ring.hs
class Ring a where
rmult :: a -> a -> a
zero :: a
-- PrimitiveArray.hs
class PrimArrayOps a b where
data PrimArray a b :: *
unsafeIndex :: PrimArray a b -> a -> b
-- PAInstances.hs
-- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays
instance (Prim a) => PrimArrayOps where
data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-} !ByteArray
unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-} case (i*(mJ+1)+j) of idx -> indexByteArray arr idx
-- RNAfoldFunctions.hs
-- VU.Unbox a, because of the vector package
import Data.Vector.Unboxed as VU
class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where
opt = VU.foldl' rmult zero $ base turnertables inp table table i j
base trnr inp m m1 i j = VU.zipWith rmult ms m1s where
cnt = j-i-2 -- TODO when to stop?
ms = VU.map (\ik -> m `unsafeIndex` ik) $ VU.generate cnt (\k -> (i,i+k))
m1s = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k -> (k+1+i,j))
{-# INLINE multibranchCloseBase #-}
If I now use this stuff...
-- [1]
instance Ring Int where
rmult = min
zero = 10000
instance FoldFunctions Int
module MyProgram where
main = do
let val = opt trnr inp myM myM1 15 78
I get this core [1]. If I do this
-- [2]
instance Ring Int where
rmult = min
zero = 10000
instance FoldFunctions Int
opt = VU.foldl' rmult zero $ base turnertables inp table table i j
module MyProgram where
main = do
let myM = PrimArray of (Int,Int) with Int values
...
let val = opt trnr inp myM myM1 15 78
Is there a way to get the program without an explicit FoldFunctions instance to
specialize to the same Core as the second? The runtime (Criterion was used) is
7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs everywhere
but they do not help optimizing. These functions run O(n^2) times, with n
between 100 and 10000. The core shows that temporary arrays are even created,
filled, and then the fold run over them.
So basically, can I get code running through several class instances to be
optimized at the caller, where all instances are known? Otherwise I could live
with [2], but as the code will almost always be the same for FoldFunctions
instances, it would be really nice to be able to use the defaults that were
defined on (Ring a, VU.Unbox a, Prim a).
Thanks,
Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100826/2925254c/attachment.bin
More information about the Glasgow-haskell-users
mailing list