Simon Peyton Jones simonpj at microsoft.com
Fri Feb 17 16:41:33 UTC 2017

Ben, David, Reid
I have been working for months (on and off, mostly off, but very ON for the last week or two) on a very simple idea: the simplifier should inline things even in the "gentle" phase.
It seems so simple.  And it is: the key patch is tiny.
But it stressed corners of the optimiser that were not stressed before; and digging into it showed opportunities I did not know about before.
So I  have ended up a with a whole series of patches, which are on wip/spj-early-inline branch

7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated from the preceding patches

0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points

3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in TcErrors, slightly

9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types

3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet

912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch

7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing in CoreUnfold

bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr

8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive bindings

50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives in CSE

2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in phase 2 only

b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts

c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule names

b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of rule firings

607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen

67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings (fiBind)

e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr

Much to my surprise, we get some jolly nice improvements in compiler perf:

3%   perf/compiler/T5837.run            T5837 [stat too good] (normal)

7%   perf/compiler/parsing001.run       parsing001 [stat too good] (normal)

9%   perf/compiler/T12234.run           T12234 [stat too good] (optasm)

35%  perf/compiler/T9020.run            T9020 [stat too good] (optasm)

9%   perf/compiler/T3064.run            T3064 [stat too good] (normal)

13%  perf/compiler/T9961.run            T9961 [stat too good] (normal)

20%  perf/compiler/T13056.run           T13056 [stat too good] (optasm)

5%   perf/compiler/T9872d.run           T9872d [stat too good] (normal)

5%   perf/compiler/T9872c.run           T9872c [stat too good] (normal)

5%   perf/compiler/T9872b.run           T9872b [stat too good] (normal)

7%   perf/compiler/T9872a.run           T9872a [stat too good] (normal)

5%   perf/compiler/T783.run             T783 [stat too good] (normal)

35%   perf/compiler/T12227.run           T12227 [stat too good] (normal)

20%   perf/compiler/T1969.run            T1969 [stat too good] (normal)

5%   perf/should_run/lazy-bs-alloc.run  lazy-bs-alloc [stat too good] (normal)

5%   perf/compiler/T12707.run         T12707 [stat too good] (normal)

4%   perf/compiler/T3294.run            T3294 [stat too good] (normal)

1.5% perf/space_leaks/T4029.run         T4029 [stat too good] (ghci)

So what is left?  I have sunk so much time into this and am still not QUITE out of the woods.   I was left with

Unexpected failures:

   codeGen/should_compile/debug.run              debug [bad stdout] (normal)

   concurrent/should_run/T4030.run               T4030 [bad exit code] (normal)
I'm re-validating having pulled from HEAD, but I THINK that's all.

*         I don't know how to Phab these individually

*         I have not sweated through which patch is responsible for which perf improvments.  Maybe Gipeda can tell?

*         I have not put each error message change with the correct patch.  I don't know how much that matters.
So this is to say: anything you guys can do to help get this actually Done would be really helpful.   I'm out of time till Monday at least.
It would be great to collect those performance improvements!

