cross module optimization issues

John Lato jwlato at gmail.com
Fri Nov 21 04:54:18 EST 2008


On Wed, Nov 19, 2008 at 4:17 PM, Simon Peyton-Jones
<simonpj at microsoft.com> wrote:
> | I'm compiling with -O2 -Wall.  After looking at the Core output, I
> | think I've found the key difference.  A function that is bound in a
> | "where" statement is different between the monolithic and split
> | sources.  I have no idea why, though.  I'll experiment with a few
> | different things to see if I can get this resolved.
>
> In general, splitting code across modules should not make programs less efficient -- as Don says, GHC does quite aggressive cross-module inlining.
>
> There is one exception, though.  If a non-exported non-recursive function is called exactly once, then it is inlined *regardless of size*, because doing so does not cause code duplication.  But if it's exported and is large, then its inlining is not exposed -- and even if it were it might not be inlined, because doing so duplicates its code an unknown number of times.  You can change the threshold for (a) exposing and (b) using an inlining, with flags -funfolding-creation-threshold and -funfolding-use-threshold respectively.
>
> If you find there's something else going on then I'm all ears.
>
> Simon
>

I did finally find the changes that make a difference.  I think it's
safe to say that I have no idea what's actually going on, so I'll just
report my results and let others try to figure it out.

I tried upping the thresholds mentioned, up to
-funfolding-creation-threshold 200 -funfolding-use-threshold 100.
This didn't seem to make any performance difference (I didn't check
the core output).

This project is based on Oleg's Iteratee code; I started using his
IterateeM.hs and Enumerator.hs files and added my own stuff to
Enumerator.hs (thanks Oleg, great work as always).  When I started
cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my
minimal test case increased from 19s to 43s.

I've found two factors that contributed.  When I was cleaning up, I
also removed a bunch of unused functions from IterateeM.hs (some of
the test functions and functions specific to his running example of
HTTP encoding).  When I added those functions back in, and added
INLINE pragmas to the exported functions in MyEnum.hs, I got the
performance back.

In general I hadn't added export lists to the modules yet, so all
functions should have been exported.

So it seems that somehow the unused functions in IterateeM.hs are
affecting how the functions I care about get implemented (or
exported).  I did not expect that.  Next step for me is to see what
happens if I INLINE the functions I'm exporting and remove the others,
I suppose.

Thank you Simon and Don for your advice, especially since I'm pretty
far over my head at this point.

John


More information about the Glasgow-haskell-users mailing list