inter module optimizations
Donald Bruce Stewart
dons at cse.unsw.edu.au
Wed Mar 28 09:18:45 EDT 2007
fmohamed:
> I had posted some data on inter-module optimizations that I had
> calculated when splitting my program from one computational module to
> many different ones.
>
> Tim Chevalier suggested that my calculation could be interesting to the
> people here.
>
> So I made the effort of preparing the various versions of my code and re
> doing the analysis better.
> Unfortunately I had already began renaming things without doing a darcs
> record, so in the split version some function names are different.
>
> I have a tar.bz archive of 21KB, but I did not know if it is considered
> rude to send attachments, but if someone is interested I can send him
> the file.
>
> Basically it mainly boils down to non-inlining of some important
> functions on a newtype (
> type LatLocI = Word32
> newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
> ), because specialization should not be an issue as I had already given
> specific signatures to my functions.
>
> Also worth noting is that using the profiling with -O2 compilation makes
> one thing that inlining (or using a single module) makes the program
> slower, whereas the opposite is true. I think that the profiling
> overhead are incorrectly evaluated.
> I know that with -O2 one cannot expect profiling to be good, but it
> would be nice if it wouldn't be so misleading
>
> Here some data (obtained with a script that is also in the tar.bz archive)
>
> ******** allInOne:
> original program, monolithic main computational module
> * timings of -O2 executable
> 7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+894minor)pagefaults 0swaps
> * timings of the executable with profiling
> total time = 15.25 secs (305 ticks @ 50 ms)
> total alloc = 5,888,786,120 bytes (excludes profiling overheads)
> ******** splitModule NoReexport NoInline directives:
> split computational module, no export list for split modules
> * timings of -O2 executable
> 10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
> total time = 11.85 secs (237 ticks @ 50 ms)
> total alloc = 5,888,780,912 bytes (excludes profiling overheads)
> ******** splitModule Reexport NoInline directives:
> computational module, no export list for split modules, old module
> reexport using export list
> * timings of -O2 executable
> 8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
> total time = 12.20 secs (244 ticks @ 50 ms)
> total alloc = 5,888,780,912 bytes (excludes profiling overheads)
> ******** splitModule NoReexport Inline directives:
> split computational module, no export list for split modules, explicit
> inline directives
> * timings of -O2 executable
> 6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+895minor)pagefaults 0swaps
> * timings of the executable with profiling
> total time = 18.80 secs (376 ticks @ 50 ms)
> total alloc = 5,374,883,312 bytes (excludes profiling overheads)
> *************
>
> Fawzi
To really understand what is going on, I suggest looking at the
-ddump-simpl output as you change the inlining settings. Then you'll see
how GHC is moving code about.
-- Don (who's spent the last 2 weeks playing the simplifer/inliner game)
More information about the Glasgow-haskell-users
mailing list