[GHC] #10678: integer-gmp's runS seems unnecessarily expensive

Tue Aug 4 04:46:52 UTC 2015

#10678: integer-gmp's runS seems unnecessarily expensive
-------------------------------------+-------------------------------------
        Reporter:  rwbarton          |                   Owner:
            Type:  bug               |                  Status:  new
        Priority:  normal            |               Milestone:
       Component:  Compiler          |                 Version:  7.10.1
  (CodeGen)                          |
      Resolution:                    |                Keywords:
Operating System:  Unknown/Multiple  |            Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |               Test Case:
      Blocked By:                    |                Blocking:
 Related Tickets:                    |  Differential Revisions:
-------------------------------------+-------------------------------------

Comment (by rwbarton):

 With #10694 fixed I am pretty happy with the results so far. Allocations
 are down for Integer and ByteString users as expected. Module size,
 compile allocations and compile time are also slightly down on average.
 The variation in program runtime seems to be due to a combination of noise
 and #8279. I suspect real world modern Haskell programs may gain more on
 average due to preferring ByteString and Text over String. The only
 programs in nofib which use ByteString (none use Text) are three shootout
 programs that have been highly optimized by hand. Maybe I should try
 fibon?

 For a microbenchmark
 {{{
 f :: B.ByteString -> B.ByteString
 f s = case B.uncons s of
   Just (c, s') -> B.snoc s' c
   Nothing -> B.empty
 }}}
 allocations are down from 136 bytes to 96 bytes and runtime from 16ns to
 13ns (when `s` is a 9-byte string). I got roughly similar results from an
 integer-gmp benchmark (repeatedly adding 1 to a large Integer).

 There is more room for improvement, though. Both these microbenchmarks
 allocate a boxed heap value inside the `runRW#`, only to immediately unbox
 it outside the `runRW#`. Some kind of CPR analysis + w/w-type
 transformation could eliminate these intermediate allocations. I
 implemented this transformation manually in bytestring and it shaved off
 another 40 bytes of allocation (indeed the size of a `ByteString` heap
 object) and ~10% of the runtime. However it would be much nicer for GHC to
 do this automatically. Need to think more on the best way to accomplish
 this.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/10678#comment:6>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler