Test performance impact (was: The dreaded M-R)
Simon Marlow
simonmar at microsoft.com
Thu Feb 2 07:34:30 EST 2006
On 02 February 2006 09:52, John Hughes wrote:
> Summary: 2 programs failed to compile due to type errors (anna,
gg).
> One program did 19% more allocation, a few other programs
increased
> allocation very slightly (<2%).
>
> pic +0.28% +19.27% 0.02
>
>
>
> Thanks, that was interesting. A follow-up question: pic has a space
> bug. How long will it take you to find and fix it?
I just tried this, and it took me just a few minutes. Compiling both
versions with profiling, for the original:
total time = 0.00 secs (0 ticks @ 20 ms)
total alloc = 11,200,656 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
chargeDensity ChargeDensity 0.0 2.5
accumCharge ChargeDensity 0.0 13.5
relax Potential 0.0 31.4
correct Potential 0.0 5.0
genRand Utils 0.0 1.0
fineMesh Utils 0.0 2.4
applyOpToMesh Utils 0.0 12.7
=: Utils 0.0 2.3
pushParticle PushParticle 0.0 16.1
timeStep Pic 0.0 11.0
and with the monomorphism restriction turned off:
total time = 0.02 secs (1 ticks @ 20 ms)
total alloc = 12,893,544 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
pushParticle PushParticle 100.0 20.8
chargeDensity ChargeDensity 0.0 2.2
accumCharge ChargeDensity 0.0 18.0
relax Potential 0.0 27.3
correct Potential 0.0 4.4
fineMesh Utils 0.0 2.1
applyOpToMesh Utils 0.0 11.1
=: Utils 0.0 2.0
timeStep Pic 0.0 9.5
So, ignoring the %time column (the program didn't run long enough for
the profiler to get enough time samples), we can see the following
functions increased their allocation as a % of the total:
pushParticle, accumCharge
Looking at the code for accumCharge:
accumCharge :: [Position] -> [MeshAssoc]
accumCharge [] = []
accumCharge ((x,y):xys) =
[((i ,j ) , charge * (1-dx) * (1-dy))] ++
[((i',j ) , charge * dx * (1-dy))] ++
[((i ,j') , charge * (1-dx) * dy)] ++
[((i',j') , charge * dx * dy)] ++
accumCharge xys
where
i = truncate x
i' = (i+1) `rem` nCell
j = truncate y
j' = (j+1) `rem` nCell
dx = x - fromIntegral i
dy = y - fromIntegral j
Now, because I know what I'm looking for, I can pretty quickly spot the
problem. I had to look at the definition of MeshAssoc to figure out
that the result type of this function forces i to have type Int, yet it
is used elsewhere as the argument to fromIntegral, where if i is
overloaded will be defaulted to Integer. When I give type signatures to
i and j (:: Int), the allocation reduces.
The pushParticle function has an identical pattern. Fixing these two
functions brought the performance back to the original. But I've also
changed the semantics - the author might have *wanted* i at type Integer
in the definition of dx to avoid overflow, and the monomorphism
restriction had prevented it.
I suppose you could ask how you'd find the problem if you didn't know
what to look for. So I added some more annotations:
i = {-# SCC "i" #-} truncate x
i' = {-# SCC "i'" #-} (i+1) `rem` nCell
j = {-# SCC "j" #-} truncate y
j' = {-# SCC "j'" #-} (j+1) `rem` nCell
dx = {-# SCC "dx" #-} x - fromIntegral i
dy = {-# SCC "dy" #-} y - fromIntegral j
and the profiling output shows:
i ChargeDensity 100.0 6.8
j ChargeDensity 0.0 6.8
chargeDensity ChargeDensity 0.0 2.2
accumCharge ChargeDensity 0.0 3.9
relax Potential 0.0 27.2
...
So this pretty clearly identifies the problem area (although the figures
don't quite add up, I suspect the insertion of the annotations has
affected optimisation in some way).
Still, you could argue that it doesn't actually tell you the cause of
the problem: namely that i&j are being evaluated twice as often as you
might expect by looking at the code. This is what the compiler warning
would do, and I completely agree that not having this property evident
by looking at the source code is a serious shortcoming.
> And how come speed
> improved slightly in many cases--that seems counter- intuitive.
The runtimes are unreliable, due to the short runnning time of most of
these benchmarks. We have a "slow" mode for the benchmark suite that
runs each program with larger test data, but I didn't use it this time -
mostly we find that measuring allocations is useful as a first
approximation, and it's certainly more reliable.
(rest of email snipped, most of which I agree with).
Cheers,
Simon
More information about the Haskell-prime
mailing list