[Haskell-cafe] [11/16] SBM: Graphs for hand-tweaked assembly benchmarks

Peter Firefly Brodersen Lund firefly at vax64.dk
Sat Dec 22 04:17:19 EST 2007


This report compares the hand-tweaked assembly programs with the original
untweaked programs on two vastly different microarchitectures.

This is the command I ran to generate the report:
 EXCLUDE='(xxxx|-bsl|chunk|count|acc-[23]|fold|lenfil|^c/)' \
 tools/merge.pl \
   ghc-armada-thorough-6.9.tgz \
   ghc-thorough-6.9.tgz        \
  > xx

I cut out the memory sections manually since we've already seen them and
inserted a few newlines for grouping purposes.

The first one should note is that not all tweaks are better than the originals!
The second is that the sequence of tweaks is not quite monotonically decreasing
in run-time.

The improvements don't really start until -e on the Athlon64 and -f on both.
Not until then have the load pressure been sufficiently relieved on the L1
cache that the code actually runs faster.

Note also how the two microarchitectures seem to have plateaus in different
places.  The Athlon64 seems to have the number 3 built into its silicon (efg,
jkl, mno) which fits very well with what we know about it from AMD's
documentation (the front end splits the instructions up into smaller pieces
which then get distributed to three different "pipelines", each with its own
out-of-order execution engine).

The Pentium III seems to have trouble with the simple MMX code but does very
well with the more advanced MMX code that keeps 8 space counters in a single
MMX register for many iterations.  The code I used to add those counters
horizontally is the same in both -q and -r.  Perhaps operations on both MMX
and normal registers are slow?

Loop unrolling (-s) doesn't seem to matter, in this case.

-Peter

ls-search
ghc 6.9.20071119
Pentium III (Coppermine)
596.932 MHz
TESTKIND=THOROUGH
SUFFIX=


charybdis
ghc 6.9.20071119
AMD Athlon(tm) 64 Processor 3000+
2009.160 MHz
TESTKIND=THOROUGH
SUFFIX=


Time (byte counting)            std
--------------------        avg dev slack
hs/byte-bs----acc:        3.274  1‰ 0.1  ███████████████████████████          |
 --                       0.705  7‰ 0.1  █████████████████████▋               |
hand/byte-bs----acc-a:    3.511  1‰ 0.0  ████████████████████████████▉        |
 --                       0.639  2‰ 0.2  ███████████████████▋                 |
hand/byte-bs----acc-b:    1.998  2‰ 0.1  ████████████████▌                    |
 --                       0.414  2‰ 0.5  ████████████▊                        |
hand/byte-bs----acc-c:    1.876  2‰ 0.1  ███████████████▌                     |
 --                       0.414  3‰ 0.2  ████████████▊                        |
hand/byte-bs----acc-d:    1.876  1‰ 0.1  ███████████████▌                     |
 --                       0.415  3‰ 0.2  ████████████▊                        |


Time (space counting)           std
---------------------       avg dev slack
hs/space-bs-c8-acc-1:     4.318  1‰ 0.0  ███████████████████████████████████▋ |
 --                       1.145  1‰ 0.2  ███████████████████████████████████▏ |
hand/space-bs-c8-acc-1-a: 4.318  1‰ 0.0  ███████████████████████████████████▋ |
 --                       1.177  2‰ 0.3  ████████████████████████████████████▏|
hand/space-bs-c8-acc-1-b: 4.331  1‰ 0.0  ███████████████████████████████████▋ |
 --                       1.104  1‰ 0.2  █████████████████████████████████▉   |
hand/space-bs-c8-acc-1-c: 4.492  1‰ 0.1  █████████████████████████████████████|
 --                       1.207  1‰ 0.3  █████████████████████████████████████|
hand/space-bs-c8-acc-1-d: 4.354  1‰ 0.0  ███████████████████████████████████▉ |
 --                       1.191  1‰ 0.2  ████████████████████████████████████▌|

hand/space-bs-c8-acc-1-e: 4.424  0‰ 0.1  ████████████████████████████████████▌|
 --                       0.937  1‰ 0.2  ████████████████████████████▊        |
hand/space-bs-c8-acc-1-f: 4.164  1‰ 0.0  ██████████████████████████████████▎  |
 --                       0.921  1‰ 0.2  ████████████████████████████▎        |
hand/space-bs-c8-acc-1-g: 4.309  1‰ 0.1  ███████████████████████████████████▌ |
 --                       0.927  2‰ 0.4  ████████████████████████████▍        |

hand/space-bs-c8-acc-1-h: 4.202  1‰ 0.1  ██████████████████████████████████▋  |
 --                       0.886  2‰ 0.2  ███████████████████████████▏         |
hand/space-bs-c8-acc-1-i: 3.820  1‰ 0.1  ███████████████████████████████▌     |
 --                       0.803  3‰ 0.4  ████████████████████████▋            |

hand/space-bs-c8-acc-1-j: 3.472  1‰ 0.0  ████████████████████████████▋        |
 --                       0.706  2‰ 0.1  █████████████████████▋               |
hand/space-bs-c8-acc-1-k: 3.474  1‰ 0.0  ████████████████████████████▋        |
 --                       0.705  1‰ 0.0  █████████████████████▋               |
hand/space-bs-c8-acc-1-l: 3.498  1‰ 0.1  ████████████████████████████▉        |
 --                       0.710  2‰ 0.1  █████████████████████▊               |

hand/space-bs-c8-acc-1-m: 3.397  1‰ 0.1  ████████████████████████████         |
 --                       0.642  6‰ 0.3  ███████████████████▋                 |
hand/space-bs-c8-acc-1-n: 3.373  1‰ 0.0  ███████████████████████████▊         |
 --                       0.636  4‰ 0.5  ███████████████████▌                 |
hand/space-bs-c8-acc-1-o: 3.118  1‰ 0.1  █████████████████████████▋           |
 --                       0.626  2‰ 0.0  ███████████████████▎                 |

hand/space-bs-c8-acc-1-p: 2.935  2‰ 0.0  ████████████████████████▏            |
 --                       0.565  3‰ 0.4  █████████████████▍                   |
hand/space-bs-c8-acc-1-q: 3.477  1‰ 0.1  ████████████████████████████▋        |
 --                       0.418  6‰ 0.7  ████████████▉                        |
hand/space-bs-c8-acc-1-r: 1.674  1‰ 0.1  █████████████▊                       |
 --                       0.334  5‰ 0.6  ██████████▎                          |
hand/space-bs-c8-acc-1-s: 1.627  1‰ 0.2  █████████████▍                       |
 --                       0.335  4‰ 0.9  ██████████▎                          |



More information about the Haskell-Cafe mailing list