cross module optimization issues

Sat Nov 22 13:55:11 EST 2008

jwlato:
> On Wed, Nov 19, 2008 at 4:17 PM, Simon Peyton-Jones
> <simonpj at microsoft.com> wrote:
> > | I'm compiling with -O2 -Wall.  After looking at the Core output, I
> > | think I've found the key difference.  A function that is bound in a
> > | "where" statement is different between the monolithic and split
> > | sources.  I have no idea why, though.  I'll experiment with a few
> > | different things to see if I can get this resolved.
> >
> > In general, splitting code across modules should not make programs less efficient -- as Don says, GHC does quite aggressive cross-module inlining.
> >
> > There is one exception, though.  If a non-exported non-recursive function is called exactly once, then it is inlined *regardless of size*, because doing so does not cause code duplication.  But if it's exported and is large, then its inlining is not exposed -- and even if it were it might not be inlined, because doing so duplicates its code an unknown number of times.  You can change the threshold for (a) exposing and (b) using an inlining, with flags -funfolding-creation-threshold and -funfolding-use-threshold respectively.
> >
> > If you find there's something else going on then I'm all ears.
> >
> > Simon
> >
> 
> I did finally find the changes that make a difference.  I think it's
> safe to say that I have no idea what's actually going on, so I'll just
> report my results and let others try to figure it out.
> 
> I tried upping the thresholds mentioned, up to
> -funfolding-creation-threshold 200 -funfolding-use-threshold 100.
> This didn't seem to make any performance difference (I didn't check
> the core output).
> 
> This project is based on Oleg's Iteratee code; I started using his
> IterateeM.hs and Enumerator.hs files and added my own stuff to
> Enumerator.hs (thanks Oleg, great work as always).  When I started
> cleaning up by moving my functions from Enumerator.hs to MyEnum.hs, my
> minimal test case increased from 19s to 43s.
> 
> I've found two factors that contributed.  When I was cleaning up, I
> also removed a bunch of unused functions from IterateeM.hs (some of
> the test functions and functions specific to his running example of
> HTTP encoding).  When I added those functions back in, and added
> INLINE pragmas to the exported functions in MyEnum.hs, I got the
> performance back.
> 
> In general I hadn't added export lists to the modules yet, so all
> functions should have been exported.
> 
> So it seems that somehow the unused functions in IterateeM.hs are
> affecting how the functions I care about get implemented (or
> exported).  I did not expect that.  Next step for me is to see what
> happens if I INLINE the functions I'm exporting and remove the others,
> I suppose.
> 
> Thank you Simon and Don for your advice, especially since I'm pretty
> far over my head at this point.
> 

Is this , since it is in IO code, a -fno-state-hack scenario?
Simon  wrote recently about when and why -fno-state-hack would be
needed, if you want to follow that up.

-- Don