Simon Peyton Jones
simonpj at microsoft.com
Fri Feb 17 16:41:33 UTC 2017
Ben, David, Reid
I have been working for months (on and off, mostly off, but very ON for the last week or two) on a very simple idea: the simplifier should inline things even in the "gentle" phase.
It seems so simple. And it is: the key patch is tiny.
But it stressed corners of the optimiser that were not stressed before; and digging into it showed opportunities I did not know about before.
So I have ended up a with a whole series of patches, which are on wip/spj-early-inline branch
7f14d15c0e5fc2c9a81db3d0f0b01d85857b1d87 Error message wibbles accumulated from the preceding patches
0499c65d9fa45e7879e1e1264fdaa15274adcba6 Improve SetLevels for join points
3b2fc0827ff6cafa34836c2d9dc710b628c990b6 Change -ddump-tc-trace output in TcErrors, slightly
9ffdf62b0ca72c4f35579f9d6f31a9beebf23025 Improve pretty-printing of types
3f346eac06399a79adf48425018ee949cee245bf Add VarSet.anyDVarSet, allDVarSet
912e71eb3b4ec91e805ecf2236d1033e55e2933a The Early Inline Patch
7188cd13f8e54efa764d52ca016b87b3669b29f5 Small changes to expression sizing in CoreUnfold
bfc6fa3f377d11bdfcdbf82b65bf2f39cb00b90c Fix SetLevels for makeStaticPtr
8b1cfea089faacb5b95ffcc3511e05faeabb8076 Extend CSE to handle recursive bindings
50411995641802568bb27c867afe804f91e0524c Combine identical case alterantives in CSE
2e077ccc736a0b2a622b7f42b7929966bddb4ded Inline data constructor wrappers in phase 2 only
b868de53dd19f639c1070089ecff21948ff33e0d Make Specialise work with casts
c767ae5f04a09ef71dcb8f67a17225a52c2cc5d2 Stop uniques ending up in SPEC rule names
b49ed1f0102f93ca7f62632c436b41bd240b501f Occurrence-analyse the result of rule firings
607a735dfb99bb8f0edf466ccb01e732218c42ec Add -fspec-constr-keen
67a0c1872c0515f1f12ea68097a84e02da92f45b Refactor floating of bindings (fiBind)
e90f4d7c6d3003039fa1647a3da3dafcaa75527b More tracing in SpecConstr
Much to my surprise, we get some jolly nice improvements in compiler perf:
3% perf/compiler/T5837.run T5837 [stat too good] (normal)
7% perf/compiler/parsing001.run parsing001 [stat too good] (normal)
9% perf/compiler/T12234.run T12234 [stat too good] (optasm)
35% perf/compiler/T9020.run T9020 [stat too good] (optasm)
9% perf/compiler/T3064.run T3064 [stat too good] (normal)
13% perf/compiler/T9961.run T9961 [stat too good] (normal)
20% perf/compiler/T13056.run T13056 [stat too good] (optasm)
5% perf/compiler/T9872d.run T9872d [stat too good] (normal)
5% perf/compiler/T9872c.run T9872c [stat too good] (normal)
5% perf/compiler/T9872b.run T9872b [stat too good] (normal)
7% perf/compiler/T9872a.run T9872a [stat too good] (normal)
5% perf/compiler/T783.run T783 [stat too good] (normal)
35% perf/compiler/T12227.run T12227 [stat too good] (normal)
20% perf/compiler/T1969.run T1969 [stat too good] (normal)
5% perf/should_run/lazy-bs-alloc.run lazy-bs-alloc [stat too good] (normal)
5% perf/compiler/T12707.run T12707 [stat too good] (normal)
4% perf/compiler/T3294.run T3294 [stat too good] (normal)
1.5% perf/space_leaks/T4029.run T4029 [stat too good] (ghci)
So what is left? I have sunk so much time into this and am still not QUITE out of the woods. I was left with
codeGen/should_compile/debug.run debug [bad stdout] (normal)
concurrent/should_run/T4030.run T4030 [bad exit code] (normal)
I'm re-validating having pulled from HEAD, but I THINK that's all.
* I don't know how to Phab these individually
* I have not sweated through which patch is responsible for which perf improvments. Maybe Gipeda can tell?
* I have not put each error message change with the correct patch. I don't know how much that matters.
So this is to say: anything you guys can do to help get this actually Done would be really helpful. I'm out of time till Monday at least.
It would be great to collect those performance improvements!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ghc-devs