cross module optimization issues

Fri Nov 28 09:46:33 EST 2008

The $f2 comes from the instance Monad (IterateeGM ...).
print_lines uses a specialised version of that instance, namely
        Monad (IterateeGM el IO)
The fact that print_lines uses it makes GHC generate a specialised version of the instance decl.

Even in the absence of print_lines you can generate the specialised instance thus

instance Monad m => Monad (IterateeGM el m) where
    {-# SPECIALISE instance Monad (IterateeGM el IO) #-}
        ... methods...

does that help?

Simon

| -----Original Message-----
| From: John Lato [mailto:jwlato at gmail.com]
| Sent: 28 November 2008 12:07
| To: Simon Peyton-Jones
| Cc: Neil Mitchell; glasgow-haskell-users at haskell.org; Don Stewart
| Subject: Re: cross module optimization issues
|
| Neil, thank you very much for taking the time to look at this; I
| greatly appreciate it.
|
| One thing I don't understand is why the specializations are caused by
| print_lines.  I suppose the optimizer can infer something which it
| couldn't otherwise.
|
| If I read this properly, the functions being specialized are liftI,
| (>>=), return, and $f2.  One thing I'm not sure about is when INLINE
| provides the desired optimal behavior, as opposed to SPECIALIZE.  The
| monad functions are defined in the Monad instance, and thus aren't
| currently INLINE'd or SPECIALIZE'd.  However, if they are separate
| functions, would INLINE be sufficient?  Would that give the optimizer
| enough to work with the derive the specializations on its own?  I'll
| have some time to experiment with this myself tomorrow, but I'd
| appreciate some direction (rather than guessing blindly).
|
| What is "$f2"?  I've seen that appear before, but I'm not sure where
| it comes from.
|
| Thanks,
| John
|
| On Fri, Nov 28, 2008 at 10:31 AM, Simon Peyton-Jones
| <simonpj at microsoft.com> wrote:
| > The specialisations are indeed caused (indirectly) by the presence of print_lines.  If
| print_lines is dead code (as it is when print_lines is not exported), then there are no calls
| to the overloaded functions at these specialised types, and so you don't get the specialised
| versions.  You can get specialised versions by a SPECIALISE pragma, or SPECIALISE INSTANCE
| >
| > Does that make sense?
| >
| > Simon
| >
| > | -----Original Message-----
| > | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
| > | Sent: 28 November 2008 09:48
| > | To: Simon Peyton-Jones
| > | Cc: John Lato; glasgow-haskell-users at haskell.org; Don Stewart
| > | Subject: Re: cross module optimization issues
| > |
| > | Hi
| > |
| > | I've talked to John a bit, and discussed test cases etc. I've tracked
| > | this down a little way.
| > |
| > | Given the attached file, compiling witih SHORT_EXPORT_LIST makes the
| > | code go _slower_. By exporting the "print_lines" function the code
| > | doubles in speed. This runs against everything I was expecting, and
| > | that Simon has described.
| > |
| > | Taking a look at the .hi files for the two alternatives, there are two
| > | differences:
| > |
| > | 1) In the faster .hi file, the body of print_lines is exported. This
| > | is reasonable and expected.
| > |
| > | 2) In the faster .hi file, there are additional specialisations, which
| > | seemingly have little/nothing to do with print_lines, but are omitted
| > | if it is not exported:
| > |
| > | "SPEC >>= [GHC.IOBase.IO]" ALWAYS forall @ el
| > |                                          $dMonad :: GHC.Base.Monad GHC.IOBase.IO
| > |   Sound.IterateeM.>>= @ GHC.IOBase.IO @ el $dMonad
| > |   = Sound.IterateeM.a
| > |       `cast`
| > |     (forall el1 a b.
| > |      Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO a
| > |      -> (a -> Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO b)
| > |      -> trans
| > |             (sym ((GHC.IOBase.:CoIO)
| > |                       (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO b)))
| > |             (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO b)))
| > |       @ el
| > | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el
| > |                                                          $dMonad ::
| > | GHC.Base.Monad GHC.IOBase.IO
| > |   Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad
| > |   = Sound.IterateeM.$s$f2 @ el
| > | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el
| > |                                                          $dMonad ::
| > | GHC.Base.Monad GHC.IOBase.IO
| > |   Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad
| > |   = Sound.IterateeM.$s$f21 @ el
| > | "SPEC Sound.IterateeM.liftI [GHC.IOBase.IO]" ALWAYS forall @ el
| > |                                                            @ a
| > |                                                            $dMonad ::
| > | GHC.Base.Monad GHC.IOBase.IO
| > |   Sound.IterateeM.liftI @ GHC.IOBase.IO @ el @ a $dMonad
| > |   = Sound.IterateeM.$sliftI @ el @ a
| > | "SPEC return [GHC.IOBase.IO]" ALWAYS forall @ el
| > |                                             $dMonad :: GHC.Base.Monad
| > | GHC.IOBase.IO
| > |   Sound.IterateeM.return @ GHC.IOBase.IO @ el $dMonad
| > |   = Sound.IterateeM.a7
| > |       `cast`
| > |     (forall el1 a.
| > |      a
| > |      -> trans
| > |             (sym ((GHC.IOBase.:CoIO)
| > |                       (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO a)))
| > |             (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO a)))
| > |       @ el
| > |
| > | My guess is that these cause the slowdown - but is there any reason
| > | that print_lines not being exported should cause them to be omitted?
| > |
| > | All these tests were run on GHC 6.10.1 with -O2.
| > |
| > | Thanks
| > |
| > | Neil
| > |
| > |
| > | On Fri, Nov 21, 2008 at 10:33 AM, Simon Peyton-Jones
| > | <simonpj at microsoft.com> wrote:
| > | > | This project is based on Oleg's Iteratee code; I started using his
| > | > | IterateeM.hs and Enumerator.hs files and added my own stuff to
| > | > | Enumerator.hs (thanks Oleg, great work as always).  When I started
| > | > | cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my
| > | > | minimal test case increased from 19s to 43s.
| > | > |
| > | > | I've found two factors that contributed.  When I was cleaning up, I
| > | > | also removed a bunch of unused functions from IterateeM.hs (some of
| > | > | the test functions and functions specific to his running example of
| > | > | HTTP encoding).  When I added those functions back in, and added
| > | > | INLINE pragmas to the exported functions in MyEnum.hs, I got the
| > | > | performance back.
| > | > |
| > | > | In general I hadn't added export lists to the modules yet, so all
| > | > | functions should have been exported.
| > | >
| > | > I'm totally snowed under with backlog from my recent absence, so I can't look at this
| > | myself, but if anyone else wants to I'd be happy to support with advice and suggestions.
| > | >
| > | > In general, having an explicit export list is good for performance. I typed an extra
| section
| > | in the GHC performance resource http://haskell.org/haskellwiki/Performance/GHC to explain
| why.
| > | In general that page is where we should document user advice for performance in GHC.
| > | >
| > | > I can't explain why *adding* unused functions would change performance though!
| > | >
| > | > Simon
| > | >
| > | >
| > | > _______________________________________________
| > | > Glasgow-haskell-users mailing list
| > | > Glasgow-haskell-users at haskell.org
| > | > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
| > | >
| >