[GHC] #10678: integer-gmp's runS seems unnecessarily expensive
GHC
ghc-devs at haskell.org
Tue Aug 4 04:46:52 UTC 2015
#10678: integer-gmp's runS seems unnecessarily expensive
-------------------------------------+-------------------------------------
Reporter: rwbarton | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.1
(CodeGen) |
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions:
-------------------------------------+-------------------------------------
Comment (by rwbarton):
With #10694 fixed I am pretty happy with the results so far. Allocations
are down for Integer and ByteString users as expected. Module size,
compile allocations and compile time are also slightly down on average.
The variation in program runtime seems to be due to a combination of noise
and #8279. I suspect real world modern Haskell programs may gain more on
average due to preferring ByteString and Text over String. The only
programs in nofib which use ByteString (none use Text) are three shootout
programs that have been highly optimized by hand. Maybe I should try
fibon?
For a microbenchmark
{{{
f :: B.ByteString -> B.ByteString
f s = case B.uncons s of
Just (c, s') -> B.snoc s' c
Nothing -> B.empty
}}}
allocations are down from 136 bytes to 96 bytes and runtime from 16ns to
13ns (when `s` is a 9-byte string). I got roughly similar results from an
integer-gmp benchmark (repeatedly adding 1 to a large Integer).
There is more room for improvement, though. Both these microbenchmarks
allocate a boxed heap value inside the `runRW#`, only to immediately unbox
it outside the `runRW#`. Some kind of CPR analysis + w/w-type
transformation could eliminate these intermediate allocations. I
implemented this transformation manually in bytestring and it shaved off
another 40 bytes of allocation (indeed the size of a `ByteString` heap
object) and ~10% of the runtime. However it would be much nicer for GHC to
do this automatically. Need to think more on the best way to accomplish
this.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10678#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list