[GHC] #8793: Improve GHC.Event.IntTable performance
GHC
ghc-devs at haskell.org
Wed Jan 6 01:44:47 UTC 2016
#8793: Improve GHC.Event.IntTable performance
-------------------------------------+-------------------------------------
Reporter: cdk | Owner:
Type: task | Status: patch
Priority: normal | Milestone: 8.0.1
Component: Core Libraries | Version: 7.6.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by jscholl):
I did take another look at the {{{lookup}}} function and compared the core
and the generated cmm in both cases. There was nothing special to see in
the core as both versions compile to simple loops without any unexpected
things like a dictionary floating around. The IO version only packs the
result into an unboxed tuple with the state token while the pure version
lacks this (of course).
The cmm of both versions sheds more light on the reason for the speedup:
GHC compiles the pure version to a nice loop which does not jump back to
the beginning of the function, but behind the stack check (the stack is
needed to evaluate unevaluated buckets), while the IO version just calls
itself recursively (i.e. jumps before the stack check). Otherwise they
seemed pretty identical as far as I could tell.
And sadly my measurement of the speedup was wrong as I only took the
original benchmark from cdk and modified it as necessary. The original
version consumes all my memory as criterion runs benchmarks multiple times
and mutable data structures are somewhat sensitive to such a behavior,
especially if the insert operation extends already existing elements. So
the first thing I did was replacing the lists with sets, which are less
sensitive if an existing element is inserted a second time. Another
important thing was the order in which benchmarks are run: If we first
insert in all competing implementations, the second implementation suffers
from a heap with more live data, meaning longer GC delays. So I changed
the order in such a way that first one implementation is run, then another
one while the first one is already no longer reachable and thus will be
GCed. A third thing I noticed (but only after looking at the core) was
that my change actually increases the laziness of {{{lookup}}}. The
original implementation would determine whether the result is a {{{Just}}}
or {{{Nothing}}} while my implementation returns a thunk. I though I was
safe by using {{{nfIO}}} in the benchmark to evaluate my result, but
actually I did not read that code carefully and first missed the actual
place the result is evaluated - or not. Fixing this yields a more
reasonable speedup of something around 10-15% (which also fits the
observed differences in the cmm much better).
I can (try to) attach the modified benchmark, if this helps.
I did also take another look at the unused argument in {{{updateWith.go}}}
and am now quite sure that it is something leftover from the development
of the function which has no actual function anymore and can safely be
removed. I guess it would have never been in the original commit, but it
seems hard or impossible to correctly identity such unused arguments (I
mean, it is used, but only in a function which does not use it...).
Especially, if such an argument would be used to pass a type to a
recursive function, which just passes it on and uses ScopedTypeVariables
to access the type, so it never touches the argument directly...
And yes, I can hopefully try to submit the changes to Phabricator
tomorrow.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8793#comment:13>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list