Test performance impact (was: The dreaded M-R)

Simon Marlow simonmar at microsoft.com
Thu Feb 2 07:34:30 EST 2006

On 02 February 2006 09:52, John Hughes wrote:

> 	Summary: 2 programs failed to compile due to type errors (anna,
> 	One program did 19% more allocation, a few other programs
> 	allocation very slightly (<2%).
> 	            pic         +0.28%   +19.27%      0.02
> Thanks, that was interesting. A follow-up question: pic has a space
> bug. How long will it take you to find and fix it?

I just tried this, and it took me just a few minutes.  Compiling both
versions with profiling, for the original:

	total time  =        0.00 secs   (0 ticks @ 20 ms)
	total alloc =  11,200,656 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

chargeDensity                  ChargeDensity          0.0    2.5
accumCharge                    ChargeDensity          0.0   13.5
relax                          Potential              0.0   31.4
correct                        Potential              0.0    5.0
genRand                        Utils                  0.0    1.0
fineMesh                       Utils                  0.0    2.4
applyOpToMesh                  Utils                  0.0   12.7
=:                             Utils                  0.0    2.3
pushParticle                   PushParticle           0.0   16.1
timeStep                       Pic                    0.0   11.0

and with the monomorphism restriction turned off:

        total time  =        0.02 secs   (1 ticks @ 20 ms)
        total alloc =  12,893,544 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

pushParticle                   PushParticle         100.0   20.8
chargeDensity                  ChargeDensity          0.0    2.2
accumCharge                    ChargeDensity          0.0   18.0
relax                          Potential              0.0   27.3
correct                        Potential              0.0    4.4
fineMesh                       Utils                  0.0    2.1
applyOpToMesh                  Utils                  0.0   11.1
=:                             Utils                  0.0    2.0
timeStep                       Pic                    0.0    9.5

So, ignoring the %time column (the program didn't run long enough for
the profiler to get enough time samples), we can see the following
functions increased their allocation as a % of the total:

  pushParticle, accumCharge

Looking at the code for accumCharge:

accumCharge :: [Position] -> [MeshAssoc]
accumCharge [] = []
accumCharge ((x,y):xys) =
	[((i ,j ) , charge * (1-dx) * (1-dy))] ++
	[((i',j ) , charge * dx * (1-dy))] ++
	[((i ,j') , charge * (1-dx) * dy)] ++
	[((i',j') , charge * dx * dy)] ++
	accumCharge xys
	    i = truncate x
	    i' = (i+1) `rem` nCell
	    j = truncate y
	    j' = (j+1) `rem` nCell
	    dx = x - fromIntegral i
	    dy = y - fromIntegral j

Now, because I know what I'm looking for, I can pretty quickly spot the
problem.  I had to look at the definition of MeshAssoc to figure out
that the result type of this function forces i to have type Int, yet it
is used elsewhere as the argument to fromIntegral, where if i is
overloaded will be defaulted to Integer.  When I give type signatures to
i and j (:: Int), the allocation reduces.

The pushParticle function has an identical pattern.  Fixing these two
functions brought the performance back to the original.  But I've also
changed the semantics - the author might have *wanted* i at type Integer
in the definition of dx to avoid overflow, and the monomorphism
restriction had prevented it.

I suppose you could ask how you'd find the problem if you didn't know
what to look for.  So I added some more annotations:

	    i = {-# SCC "i" #-} truncate x
	    i' = {-# SCC "i'" #-}  (i+1) `rem` nCell
	    j = {-# SCC "j" #-} truncate y
	    j' = {-# SCC "j'" #-} (j+1) `rem` nCell
	    dx = {-# SCC "dx" #-} x - fromIntegral i
	    dy = {-# SCC "dy" #-} y - fromIntegral j

and the profiling output shows:

i                              ChargeDensity        100.0    6.8
j                              ChargeDensity          0.0    6.8
chargeDensity                  ChargeDensity          0.0    2.2
accumCharge                    ChargeDensity          0.0    3.9
relax                          Potential              0.0   27.2

So this pretty clearly identifies the problem area (although the figures
don't quite add up, I suspect the insertion of the annotations has
affected optimisation in some way).

Still, you could argue that it doesn't actually tell you the cause of
the problem: namely that i&j are being evaluated twice as often as you
might expect by looking at the code.  This is what the compiler warning
would do, and I completely agree that not having this property evident
by looking at the source code is a serious shortcoming.

> And how come speed
> improved slightly in many cases--that seems counter- intuitive.

The runtimes are unreliable, due to the short runnning time of most of
these benchmarks.  We have a "slow" mode for the benchmark suite that
runs each program with larger test data, but I didn't use it this time -
mostly we find that measuring allocations is useful as a first
approximation, and it's certainly more reliable.

(rest of email snipped, most of which I agree with).


More information about the Haskell-prime mailing list