late optimization of typeclass callers

christian Hoener zu Siederdissen choener at tbi.univie.ac.at
Fri Aug 27 15:00:42 EDT 2010


Thanks that is *exactly* the problem I am having. The missing line breaks are kind of weird.

I will try the newest head (which will mean getting about 20 libs to compile...) and report back.

I left the core out intentionally because the vector library tends to produce ugly core when it can not optimize.

Should the problem *not* go away with HEAD (right now I am using 6.12.3) then I will create a self-contained set of files that reproduce the problem.

Thanks again,
Christian

----- Original message -----
> It's hard to read your code because the line breaks have been removed. 
> Moreover I think your code depends on some unspecified Hackage package. 
> And I couldn't find the enclosed Core dumps.    Nor do you specify which
> version of GHC you are using.
> 
> Still I believe that the nub is this.    You have a class with a default
> method declaration
> 
>     class FoldFunctions a where
>                    opt = <blah>
> 
> and you get different behaviour if you say
> 
>     instance FoldFunctions Int
> from
>     instance FoldFunctions Int where
>            opt = <blah>
> 
> That's surprising because the second case simply fills in the method
> with the identical code to the code in the class declaration.    
> 
> Is that right?
> 
> If so, I think you might want to try the HEAD.    Can you do that (use a
> binary snapshot).    I think I fixed this default-method-inlining stuff so
> that there is no difference between the above two. 
> 
> But I could have failed, or there could be something else going on in
> your example.
> 
> Simon
> 
> 
> | -----Original Message-----
> | From: glasgow-haskell-users-bounces at haskell.org
> [mailto:glasgow-haskell- | users-bounces at haskell.org] On Behalf Of
> Christian Höner zu Siederdissen | Sent: 26 August 2010 10:36
> | To: Glasgow-Haskell-Users
> | Subject: late optimization of typeclass callers
> | 
> | Hi,
> | 
> | I do have the problem that my code is not completely optimized. We
> begin with | 
> | -- Ring.hs
> | class Ring a where
> |      rmult :: a -> a -> a
> |      zero :: a
> | 
> | -- PrimitiveArray.hs
> | class PrimArrayOps a b where
> |      data PrimArray a b :: *
> |      unsafeIndex :: PrimArray a b -> a -> b
> | 
> | -- PAInstances.hs
> | -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays
> instance | (Prim a) => PrimArrayOps where
> |      data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-#
> UNPACK #- | } !ByteArray
> |      unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex"
> #-} | case (i*(mJ+1)+j) of idx -> indexByteArray arr idx
> | 
> | -- RNAfoldFunctions.hs
> | -- VU.Unbox a, because of the vector package import
> Data.Vector.Unboxed as VU | class (Ring a, VU.Unbox a, Prim a) =>
> FoldFunctions a where |      opt = VU.foldl' rmult zero $ base turnertables
> inp table table i j | 
> | base trnr inp m m1 i j = VU.zipWith rmult ms m1s where
> |      cnt    = j-i-2 -- TODO when to stop?
> |      ms      = VU.map (\ik -> m    `unsafeIndex` ik) $ VU.generate cnt (\k ->
> | (i,i+k))
> |      m1s    = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k ->
> | (k+1+i,j)) {-# INLINE multibranchCloseBase #-}
> | 
> | 
> | 
> | If I now use this stuff...
> | -- [1]
> | instance Ring Int where
> |      rmult = min
> |      zero = 10000
> | instance FoldFunctions Int
> | module MyProgram where
> | main = do
> |      let val = opt trnr inp myM myM1 15 78
> | 
> | I get this core [1]. If I do this
> | -- [2]
> | instance Ring Int where
> |      rmult = min
> |      zero = 10000
> | instance FoldFunctions Int
> |      opt = VU.foldl' rmult zero $ base turnertables inp table table i j
> module | MyProgram where main = do
> |      let myM = PrimArray of (Int,Int) with Int values
> |      ...
> |      let val = opt trnr inp myM myM1 15 78
> | 
> | Is there a way to get the program without an explicit FoldFunctions
> instance | to specialize to the same Core as the second? The runtime
> (Criterion was | used) is 7.1us (or worse) for [1] and 2.7us for [2]. I
> could put nice INLINEs | everywhere but they do not help optimizing.
> These functions run O(n^2) times, | with n between 100 and 10000. The
> core shows that temporary arrays are even | created, filled, and then
> the fold run over them. | 
> | So basically, can I get code running through several class instances
> to be | optimized at the caller, where all instances are known?
> Otherwise I could | live with [2], but as the code will almost always be
> the same for | FoldFunctions instances, it would be really nice to be
> able to use the | defaults that were defined on (Ring a, VU.Unbox a,
> Prim a). | 
> | Thanks,
> | Christian



More information about the Glasgow-haskell-users mailing list