[Haskell-cafe] [11/16] SBM: Graphs for hand-tweaked assembly
benchmarks
Peter Firefly Brodersen Lund
firefly at vax64.dk
Sat Dec 22 04:17:19 EST 2007
This report compares the hand-tweaked assembly programs with the original
untweaked programs on two vastly different microarchitectures.
This is the command I ran to generate the report:
EXCLUDE='(xxxx|-bsl|chunk|count|acc-[23]|fold|lenfil|^c/)' \
tools/merge.pl \
ghc-armada-thorough-6.9.tgz \
ghc-thorough-6.9.tgz \
> xx
I cut out the memory sections manually since we've already seen them and
inserted a few newlines for grouping purposes.
The first one should note is that not all tweaks are better than the originals!
The second is that the sequence of tweaks is not quite monotonically decreasing
in run-time.
The improvements don't really start until -e on the Athlon64 and -f on both.
Not until then have the load pressure been sufficiently relieved on the L1
cache that the code actually runs faster.
Note also how the two microarchitectures seem to have plateaus in different
places. The Athlon64 seems to have the number 3 built into its silicon (efg,
jkl, mno) which fits very well with what we know about it from AMD's
documentation (the front end splits the instructions up into smaller pieces
which then get distributed to three different "pipelines", each with its own
out-of-order execution engine).
The Pentium III seems to have trouble with the simple MMX code but does very
well with the more advanced MMX code that keeps 8 space counters in a single
MMX register for many iterations. The code I used to add those counters
horizontally is the same in both -q and -r. Perhaps operations on both MMX
and normal registers are slow?
Loop unrolling (-s) doesn't seem to matter, in this case.
-Peter
ls-search
ghc 6.9.20071119
Pentium III (Coppermine)
596.932 MHz
TESTKIND=THOROUGH
SUFFIX=
charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=
Time (byte counting) std
-------------------- avg dev slack
hs/byte-bs----acc: 3.274 1â° 0.1 âââââââââââââââââââââââââââ |
-- 0.705 7â° 0.1 ââââââââââââââââââââââ |
hand/byte-bs----acc-a: 3.511 1â° 0.0 âââââââââââââââââââââââââââââ |
-- 0.639 2â° 0.2 ââââââââââââââââââââ |
hand/byte-bs----acc-b: 1.998 2â° 0.1 âââââââââââââââââ |
-- 0.414 2â° 0.5 âââââââââââââ |
hand/byte-bs----acc-c: 1.876 2â° 0.1 ââââââââââââââââ |
-- 0.414 3â° 0.2 âââââââââââââ |
hand/byte-bs----acc-d: 1.876 1â° 0.1 ââââââââââââââââ |
-- 0.415 3â° 0.2 âââââââââââââ |
Time (space counting) std
--------------------- avg dev slack
hs/space-bs-c8-acc-1: 4.318 1â° 0.0 ââââââââââââââââââââââââââââââââââââ |
-- 1.145 1â° 0.2 ââââââââââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-a: 4.318 1â° 0.0 ââââââââââââââââââââââââââââââââââââ |
-- 1.177 2â° 0.3 âââââââââââââââââââââââââââââââââââââ|
hand/space-bs-c8-acc-1-b: 4.331 1â° 0.0 ââââââââââââââââââââââââââââââââââââ |
-- 1.104 1â° 0.2 ââââââââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-c: 4.492 1â° 0.1 âââââââââââââââââââââââââââââââââââââ|
-- 1.207 1â° 0.3 âââââââââââââââââââââââââââââââââââââ|
hand/space-bs-c8-acc-1-d: 4.354 1â° 0.0 ââââââââââââââââââââââââââââââââââââ |
-- 1.191 1â° 0.2 âââââââââââââââââââââââââââââââââââââ|
hand/space-bs-c8-acc-1-e: 4.424 0â° 0.1 âââââââââââââââââââââââââââââââââââââ|
-- 0.937 1â° 0.2 âââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-f: 4.164 1â° 0.0 âââââââââââââââââââââââââââââââââââ |
-- 0.921 1â° 0.2 âââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-g: 4.309 1â° 0.1 ââââââââââââââââââââââââââââââââââââ |
-- 0.927 2â° 0.4 âââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-h: 4.202 1â° 0.1 âââââââââââââââââââââââââââââââââââ |
-- 0.886 2â° 0.2 ââââââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-i: 3.820 1â° 0.1 ââââââââââââââââââââââââââââââââ |
-- 0.803 3â° 0.4 âââââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-j: 3.472 1â° 0.0 âââââââââââââââââââââââââââââ |
-- 0.706 2â° 0.1 ââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-k: 3.474 1â° 0.0 âââââââââââââââââââââââââââââ |
-- 0.705 1â° 0.0 ââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-l: 3.498 1â° 0.1 âââââââââââââââââââââââââââââ |
-- 0.710 2â° 0.1 ââââââââââââââââââââââ |
hand/space-bs-c8-acc-1-m: 3.397 1â° 0.1 ââââââââââââââââââââââââââââ |
-- 0.642 6â° 0.3 ââââââââââââââââââââ |
hand/space-bs-c8-acc-1-n: 3.373 1â° 0.0 ââââââââââââââââââââââââââââ |
-- 0.636 4â° 0.5 ââââââââââââââââââââ |
hand/space-bs-c8-acc-1-o: 3.118 1â° 0.1 ââââââââââââââââââââââââââ |
-- 0.626 2â° 0.0 ââââââââââââââââââââ |
hand/space-bs-c8-acc-1-p: 2.935 2â° 0.0 âââââââââââââââââââââââââ |
-- 0.565 3â° 0.4 ââââââââââââââââââ |
hand/space-bs-c8-acc-1-q: 3.477 1â° 0.1 âââââââââââââââââââââââââââââ |
-- 0.418 6â° 0.7 âââââââââââââ |
hand/space-bs-c8-acc-1-r: 1.674 1â° 0.1 ââââââââââââââ |
-- 0.334 5â° 0.6 âââââââââââ |
hand/space-bs-c8-acc-1-s: 1.627 1â° 0.2 ââââââââââââââ |
-- 0.335 4â° 0.9 âââââââââââ |
More information about the Haskell-Cafe
mailing list