late optimization of typeclass callers

Simon Peyton-Jones simonpj at microsoft.com
Fri Aug 27 12:00:27 EDT 2010


It's hard to read your code because the line breaks have been removed.  Moreover I think your code depends on some unspecified Hackage package.  And I couldn't find the enclosed Core dumps.  Nor do you specify which version of GHC you are using.

Still I believe that the nub is this.  You have a class with a default method declaration

	class FoldFunctions a where
         opt = <blah>

and you get different behaviour if you say

	instance FoldFunctions Int
from
	instance FoldFunctions Int where
 	  opt = <blah>

That's surprising because the second case simply fills in the method with the identical code to the code in the class declaration.  

Is that right?

If so, I think you might want to try the HEAD.  Can you do that (use a binary snapshot).  I think I fixed this default-method-inlining stuff so that there is no difference between the above two. 

But I could have failed, or there could be something else going on in your example.

Simon


| -----Original Message-----
| From: glasgow-haskell-users-bounces at haskell.org [mailto:glasgow-haskell-
| users-bounces at haskell.org] On Behalf Of Christian Höner zu Siederdissen
| Sent: 26 August 2010 10:36
| To: Glasgow-Haskell-Users
| Subject: late optimization of typeclass callers
| 
| Hi,
| 
| I do have the problem that my code is not completely optimized. We begin with
| 
| -- Ring.hs
| class Ring a where
|   rmult :: a -> a -> a
|   zero :: a
| 
| -- PrimitiveArray.hs
| class PrimArrayOps a b where
|   data PrimArray a b :: *
|   unsafeIndex :: PrimArray a b -> a -> b
| 
| -- PAInstances.hs
| -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays instance
| (Prim a) => PrimArrayOps where
|   data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-
| } !ByteArray
|   unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-}
| case (i*(mJ+1)+j) of idx -> indexByteArray arr idx
| 
| -- RNAfoldFunctions.hs
| -- VU.Unbox a, because of the vector package import Data.Vector.Unboxed as VU
| class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where
|   opt = VU.foldl' rmult zero $ base turnertables inp table table i j
| 
| base trnr inp m m1 i j = VU.zipWith rmult ms m1s where
|   cnt  = j-i-2 -- TODO when to stop?
|   ms   = VU.map (\ik -> m  `unsafeIndex` ik) $ VU.generate cnt (\k ->
| (i,i+k))
|   m1s  = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k ->
| (k+1+i,j)) {-# INLINE multibranchCloseBase #-}
| 
| 
| 
| If I now use this stuff...
| -- [1]
| instance Ring Int where
|   rmult = min
|   zero = 10000
| instance FoldFunctions Int
| module MyProgram where
| main = do
|   let val = opt trnr inp myM myM1 15 78
| 
| I get this core [1]. If I do this
| -- [2]
| instance Ring Int where
|   rmult = min
|   zero = 10000
| instance FoldFunctions Int
|   opt = VU.foldl' rmult zero $ base turnertables inp table table i j module
| MyProgram where main = do
|   let myM = PrimArray of (Int,Int) with Int values
|   ...
|   let val = opt trnr inp myM myM1 15 78
| 
| Is there a way to get the program without an explicit FoldFunctions instance
| to specialize to the same Core as the second? The runtime (Criterion was
| used) is 7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs
| everywhere but they do not help optimizing. These functions run O(n^2) times,
| with n between 100 and 10000. The core shows that temporary arrays are even
| created, filled, and then the fold run over them.
| 
| So basically, can I get code running through several class instances to be
| optimized at the caller, where all instances are known? Otherwise I could
| live with [2], but as the code will almost always be the same for
| FoldFunctions instances, it would be really nice to be able to use the
| defaults that were defined on (Ring a, VU.Unbox a, Prim a).
| 
| Thanks,
| Christian


More information about the Glasgow-haskell-users mailing list