late optimization of typeclass callers
Simon Peyton-Jones
simonpj at microsoft.com
Fri Aug 27 12:00:27 EDT 2010
It's hard to read your code because the line breaks have been removed. Moreover I think your code depends on some unspecified Hackage package. And I couldn't find the enclosed Core dumps. Nor do you specify which version of GHC you are using.
Still I believe that the nub is this. You have a class with a default method declaration
class FoldFunctions a where
opt = <blah>
and you get different behaviour if you say
instance FoldFunctions Int
from
instance FoldFunctions Int where
opt = <blah>
That's surprising because the second case simply fills in the method with the identical code to the code in the class declaration.
Is that right?
If so, I think you might want to try the HEAD. Can you do that (use a binary snapshot). I think I fixed this default-method-inlining stuff so that there is no difference between the above two.
But I could have failed, or there could be something else going on in your example.
Simon
| -----Original Message-----
| From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-haskell-
| users-bounces at haskell.org] On Behalf Of Christian Höner zu Siederdissen
| Sent: 26 August 2010 10:36
| To: Glasgow-Haskell-Users
| Subject: late optimization of typeclass callers
|
| Hi,
|
| I do have the problem that my code is not completely optimized. We begin with
|
| -- Ring.hs
| class Ring a where
| rmult :: a -> a -> a
| zero :: a
|
| -- PrimitiveArray.hs
| class PrimArrayOps a b where
| data PrimArray a b :: *
| unsafeIndex :: PrimArray a b -> a -> b
|
| -- PAInstances.hs
| -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays instance
| (Prim a) => PrimArrayOps where
| data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-
| } !ByteArray
| unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-}
| case (i*(mJ+1)+j) of idx -> indexByteArray arr idx
|
| -- RNAfoldFunctions.hs
| -- VU.Unbox a, because of the vector package import Data.Vector.Unboxed as VU
| class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where
| opt = VU.foldl' rmult zero $ base turnertables inp table table i j
|
| base trnr inp m m1 i j = VU.zipWith rmult ms m1s where
| cnt = j-i-2 -- TODO when to stop?
| ms = VU.map (\ik -> m `unsafeIndex` ik) $ VU.generate cnt (\k ->
| (i,i+k))
| m1s = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k ->
| (k+1+i,j)) {-# INLINE multibranchCloseBase #-}
|
|
|
| If I now use this stuff...
| -- [1]
| instance Ring Int where
| rmult = min
| zero = 10000
| instance FoldFunctions Int
| module MyProgram where
| main = do
| let val = opt trnr inp myM myM1 15 78
|
| I get this core [1]. If I do this
| -- [2]
| instance Ring Int where
| rmult = min
| zero = 10000
| instance FoldFunctions Int
| opt = VU.foldl' rmult zero $ base turnertables inp table table i j module
| MyProgram where main = do
| let myM = PrimArray of (Int,Int) with Int values
| ...
| let val = opt trnr inp myM myM1 15 78
|
| Is there a way to get the program without an explicit FoldFunctions instance
| to specialize to the same Core as the second? The runtime (Criterion was
| used) is 7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs
| everywhere but they do not help optimizing. These functions run O(n^2) times,
| with n between 100 and 10000. The core shows that temporary arrays are even
| created, filled, and then the fold run over them.
|
| So basically, can I get code running through several class instances to be
| optimized at the caller, where all instances are known? Otherwise I could
| live with [2], but as the code will almost always be the same for
| FoldFunctions instances, it would be really nice to be able to use the
| defaults that were defined on (Ring a, VU.Unbox a, Prim a).
|
| Thanks,
| Christian
More information about the Glasgow-haskell-users
mailing list