cross module optimization issues
John Lato
jwlato at gmail.com
Fri Nov 28 07:07:14 EST 2008
Neil, thank you very much for taking the time to look at this; I
greatly appreciate it.
One thing I don't understand is why the specializations are caused by
print_lines. I suppose the optimizer can infer something which it
couldn't otherwise.
If I read this properly, the functions being specialized are liftI,
(>>=), return, and $f2. One thing I'm not sure about is when INLINE
provides the desired optimal behavior, as opposed to SPECIALIZE. The
monad functions are defined in the Monad instance, and thus aren't
currently INLINE'd or SPECIALIZE'd. However, if they are separate
functions, would INLINE be sufficient? Would that give the optimizer
enough to work with the derive the specializations on its own? I'll
have some time to experiment with this myself tomorrow, but I'd
appreciate some direction (rather than guessing blindly).
What is "$f2"? I've seen that appear before, but I'm not sure where
it comes from.
Thanks,
John
On Fri, Nov 28, 2008 at 10:31 AM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
> The specialisations are indeed caused (indirectly) by the presence of print_lines. If print_lines is dead code (as it is when print_lines is not exported), then there are no calls to the overloaded functions at these specialised types, and so you don't get the specialised versions. You can get specialised versions by a SPECIALISE pragma, or SPECIALISE INSTANCE
>
> Does that make sense?
>
> Simon
>
> | -----Original Message-----
> | From: Neil Mitchell [mailto:ndmitchell at gmail.com]
> | Sent: 28 November 2008 09:48
> | To: Simon Peyton-Jones
> | Cc: John Lato; glasgow-haskell-users at haskell.org; Don Stewart
> | Subject: Re: cross module optimization issues
> |
> | Hi
> |
> | I've talked to John a bit, and discussed test cases etc. I've tracked
> | this down a little way.
> |
> | Given the attached file, compiling witih SHORT_EXPORT_LIST makes the
> | code go _slower_. By exporting the "print_lines" function the code
> | doubles in speed. This runs against everything I was expecting, and
> | that Simon has described.
> |
> | Taking a look at the .hi files for the two alternatives, there are two
> | differences:
> |
> | 1) In the faster .hi file, the body of print_lines is exported. This
> | is reasonable and expected.
> |
> | 2) In the faster .hi file, there are additional specialisations, which
> | seemingly have little/nothing to do with print_lines, but are omitted
> | if it is not exported:
> |
> | "SPEC >>= [GHC.IOBase.IO]" ALWAYS forall @ el
> | $dMonad :: GHC.Base.Monad GHC.IOBase.IO
> | Sound.IterateeM.>>= @ GHC.IOBase.IO @ el $dMonad
> | = Sound.IterateeM.a
> | `cast`
> | (forall el1 a b.
> | Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO a
> | -> (a -> Sound.IterateeM.IterateeGM el1 GHC.IOBase.IO b)
> | -> trans
> | (sym ((GHC.IOBase.:CoIO)
> | (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO b)))
> | (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO b)))
> | @ el
> | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el
> | $dMonad ::
> | GHC.Base.Monad GHC.IOBase.IO
> | Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad
> | = Sound.IterateeM.$s$f2 @ el
> | "SPEC Sound.IterateeM.$f2 [GHC.IOBase.IO]" ALWAYS forall @ el
> | $dMonad ::
> | GHC.Base.Monad GHC.IOBase.IO
> | Sound.IterateeM.$f2 @ GHC.IOBase.IO @ el $dMonad
> | = Sound.IterateeM.$s$f21 @ el
> | "SPEC Sound.IterateeM.liftI [GHC.IOBase.IO]" ALWAYS forall @ el
> | @ a
> | $dMonad ::
> | GHC.Base.Monad GHC.IOBase.IO
> | Sound.IterateeM.liftI @ GHC.IOBase.IO @ el @ a $dMonad
> | = Sound.IterateeM.$sliftI @ el @ a
> | "SPEC return [GHC.IOBase.IO]" ALWAYS forall @ el
> | $dMonad :: GHC.Base.Monad
> | GHC.IOBase.IO
> | Sound.IterateeM.return @ GHC.IOBase.IO @ el $dMonad
> | = Sound.IterateeM.a7
> | `cast`
> | (forall el1 a.
> | a
> | -> trans
> | (sym ((GHC.IOBase.:CoIO)
> | (Sound.IterateeM.IterateeG el1 GHC.IOBase.IO a)))
> | (sym ((Sound.IterateeM.:CoIterateeGM) el1 GHC.IOBase.IO a)))
> | @ el
> |
> | My guess is that these cause the slowdown - but is there any reason
> | that print_lines not being exported should cause them to be omitted?
> |
> | All these tests were run on GHC 6.10.1 with -O2.
> |
> | Thanks
> |
> | Neil
> |
> |
> | On Fri, Nov 21, 2008 at 10:33 AM, Simon Peyton-Jones
> | <simonpj at microsoft.com> wrote:
> | > | This project is based on Oleg's Iteratee code; I started using his
> | > | IterateeM.hs and Enumerator.hs files and added my own stuff to
> | > | Enumerator.hs (thanks Oleg, great work as always). When I started
> | > | cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my
> | > | minimal test case increased from 19s to 43s.
> | > |
> | > | I've found two factors that contributed. When I was cleaning up, I
> | > | also removed a bunch of unused functions from IterateeM.hs (some of
> | > | the test functions and functions specific to his running example of
> | > | HTTP encoding). When I added those functions back in, and added
> | > | INLINE pragmas to the exported functions in MyEnum.hs, I got the
> | > | performance back.
> | > |
> | > | In general I hadn't added export lists to the modules yet, so all
> | > | functions should have been exported.
> | >
> | > I'm totally snowed under with backlog from my recent absence, so I can't look at this
> | myself, but if anyone else wants to I'd be happy to support with advice and suggestions.
> | >
> | > In general, having an explicit export list is good for performance. I typed an extra section
> | in the GHC performance resource http://haskell.org/haskellwiki/Performance/GHC to explain why.
> | In general that page is where we should document user advice for performance in GHC.
> | >
> | > I can't explain why *adding* unused functions would change performance though!
> | >
> | > Simon
> | >
> | >
> | > _______________________________________________
> | > Glasgow-haskell-users mailing list
> | > Glasgow-haskell-users at haskell.org
> | > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> | >
>
More information about the Glasgow-haskell-users
mailing list