late optimization of typeclass callers

Christian Höner zu Siederdissen choener at tbi.univie.ac.at
Thu Aug 26 05:36:04 EDT 2010


Hi,

I do have the problem that my code is not completely optimized. We begin
with

-- Ring.hs
class Ring a where
  rmult :: a -> a -> a
  zero :: a

-- PrimitiveArray.hs
class PrimArrayOps a b where
  data PrimArray a b :: *
  unsafeIndex :: PrimArray a b -> a -> b

-- PAInstances.hs
-- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays
instance (Prim a) => PrimArrayOps where
  data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-} !ByteArray
  unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-} case (i*(mJ+1)+j) of idx -> indexByteArray arr idx

-- RNAfoldFunctions.hs
-- VU.Unbox a, because of the vector package
import Data.Vector.Unboxed as VU
class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where
  opt = VU.foldl' rmult zero $ base turnertables inp table table i j

base trnr inp m m1 i j = VU.zipWith rmult ms m1s where
  cnt  = j-i-2 -- TODO when to stop?
  ms   = VU.map (\ik -> m  `unsafeIndex` ik) $ VU.generate cnt (\k -> (i,i+k))
  m1s  = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k -> (k+1+i,j))
{-# INLINE multibranchCloseBase #-}



If I now use this stuff...
-- [1]
instance Ring Int where
  rmult = min
  zero = 10000
instance FoldFunctions Int
module MyProgram where
main = do
  let val = opt trnr inp myM myM1 15 78

I get this core [1]. If I do this
-- [2]
instance Ring Int where
  rmult = min
  zero = 10000
instance FoldFunctions Int
  opt = VU.foldl' rmult zero $ base turnertables inp table table i j
module MyProgram where
main = do
  let myM = PrimArray of (Int,Int) with Int values
  ...
  let val = opt trnr inp myM myM1 15 78

Is there a way to get the program without an explicit FoldFunctions instance to
specialize to the same Core as the second? The runtime (Criterion was used) is
7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs everywhere
but they do not help optimizing. These functions run O(n^2) times, with n
between 100 and 10000. The core shows that temporary arrays are even created,
filled, and then the fold run over them.

So basically, can I get code running through several class instances to be
optimized at the caller, where all instances are known? Otherwise I could live
with [2], but as the code will almost always be the same for FoldFunctions
instances, it would be really nice to be able to use the defaults that were
defined on (Ring a, VU.Unbox a, Prim a).

Thanks,
Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20100826/2925254c/attachment.bin


More information about the Glasgow-haskell-users mailing list