inter module optimizations
Fawzi Mohamed
fmohamed at mac.com
Wed Mar 28 08:15:18 EDT 2007
I had posted some data on inter-module optimizations that I had
calculated when splitting my program from one computational module to
many different ones.
Tim Chevalier suggested that my calculation could be interesting to the
people here.
So I made the effort of preparing the various versions of my code and re
doing the analysis better.
Unfortunately I had already began renaming things without doing a darcs
record, so in the split version some function names are different.
I have a tar.bz archive of 21KB, but I did not know if it is considered
rude to send attachments, but if someone is interested I can send him
the file.
Basically it mainly boils down to non-inlining of some important
functions on a newtype (
type LatLocI = Word32
newtype LatLoc = LatLoc LatLocI deriving (Eq,Ord)
), because specialization should not be an issue as I had already given
specific signatures to my functions.
Also worth noting is that using the profiling with -O2 compilation makes
one thing that inlining (or using a single module) makes the program
slower, whereas the opposite is true. I think that the profiling
overhead are incorrectly evaluated.
I know that with -O2 one cannot expect profiling to be good, but it
would be nice if it wouldn't be so misleading
Here some data (obtained with a script that is also in the tar.bz archive)
******** allInOne:
original program, monolithic main computational module
* timings of -O2 executable
7.67user 0.00system 0:07.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+894minor)pagefaults 0swaps
* timings of the executable with profiling
total time = 15.25 secs (305 ticks @ 50 ms)
total alloc = 5,888,786,120 bytes (excludes profiling overheads)
******** splitModule NoReexport NoInline directives:
split computational module, no export list for split modules
* timings of -O2 executable
10.14user 0.01system 0:10.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
* timings of the executable with profiling
total time = 11.85 secs (237 ticks @ 50 ms)
total alloc = 5,888,780,912 bytes (excludes profiling overheads)
******** splitModule Reexport NoInline directives:
computational module, no export list for split modules, old module
reexport using export list
* timings of -O2 executable
8.88user 0.00system 0:08.90elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+901minor)pagefaults 0swaps
* timings of the executable with profiling
total time = 12.20 secs (244 ticks @ 50 ms)
total alloc = 5,888,780,912 bytes (excludes profiling overheads)
******** splitModule NoReexport Inline directives:
split computational module, no export list for split modules, explicit
inline directives
* timings of -O2 executable
6.44user 0.01system 0:06.46elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+895minor)pagefaults 0swaps
* timings of the executable with profiling
total time = 18.80 secs (376 ticks @ 50 ms)
total alloc = 5,374,883,312 bytes (excludes profiling overheads)
*************
Fawzi
More information about the Glasgow-haskell-users
mailing list