inter module optimizations

Donald Bruce Stewart dons at cse.unsw.edu.au
Wed Mar 28 09:18:45 EDT 2007


fmohamed:
> I had posted some data on inter-module optimizations that I had 
> calculated when splitting my program from one computational module to 
> many different ones.
> 
> Tim Chevalier suggested that my calculation could be interesting to the 
> people here.
> 
> So I made the effort of preparing the various versions of my code and re 
> doing the analysis better.
> Unfortunately I had already began renaming things without doing a darcs 
> record, so in the split version some function names are different.
> 
> I have a tar.bz archive of 21KB, but I did not know if it is considered 
> rude to send attachments, but if someone is interested I can send him 
> the file.
> 
> Basically it mainly boils down to non-inlining of some important 
> functions on a newtype (
>    type LatLocI = Word32
>    newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
> ), because specialization should not be an issue as I had already given 
> specific signatures to my functions.
> 
> Also worth noting is that using the profiling with -O2 compilation makes 
> one thing that inlining (or using a single module) makes the program 
> slower, whereas the opposite is true. I think that the profiling 
> overhead are incorrectly evaluated.
> I know that with -O2 one cannot expect profiling to be good, but it 
> would be nice if it wouldn't be so misleading
> 
> Here some data (obtained with a script that is also in the tar.bz archive)
> 
> ******** allInOne:
> original program, monolithic main computational module
> * timings of -O2 executable
> 7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+894minor)pagefaults 0swaps
> * timings of the executable with profiling
>        total time  =       15.25 secs   (305 ticks @ 50 ms)
>        total alloc = 5,888,786,120 bytes  (excludes profiling overheads)
> ******** splitModule NoReexport NoInline directives:
> split computational module, no export list for split modules
> * timings of -O2 executable
> 10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
>        total time  =       11.85 secs   (237 ticks @ 50 ms)
>        total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
> ******** splitModule Reexport NoInline directives:
> computational module, no export list for split modules, old module 
> reexport using export list
> * timings of -O2 executable
> 8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+901minor)pagefaults 0swaps
> * timings of the executable with profiling
>         total time  =       12.20 secs   (244 ticks @ 50 ms)
>        total alloc = 5,888,780,912 bytes  (excludes profiling overheads)
> ******** splitModule NoReexport Inline directives:
> split computational module, no export list for split modules, explicit 
> inline directives
> * timings of -O2 executable
> 6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+895minor)pagefaults 0swaps
> * timings of the executable with profiling
>        total time  =       18.80 secs   (376 ticks @ 50 ms)
>        total alloc = 5,374,883,312 bytes  (excludes profiling overheads)
> *************
> 
> Fawzi

To really understand what is going on, I suggest looking at the
-ddump-simpl output as you change the inlining settings. Then you'll see
how GHC is moving code about.

-- Don (who's spent the last 2 weeks playing the simplifer/inliner game)


More information about the Glasgow-haskell-users mailing list