[GHC] #8885: Add inline versions of clone array primops
GHC
ghc-devs at haskell.org
Thu Mar 13 15:31:14 UTC 2014
#8885: Add inline versions of clone array primops
------------------------------------+-------------------------------------
Reporter: tibbe | Owner: simonmar
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.9
Keywords: | Operating System: Unknown/Multiple
Architecture: Unknown/Multiple | Type of failure: None/Unknown
Difficulty: Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: |
------------------------------------+-------------------------------------
I've changed the clone array primops (i.e. `cloneArray#`,
`cloneMutableArray#`, `freezeArray#`, and `thawArray#`) to use the new
inline allocation optimization for statically known array sizes.
Furthermore, I've moved the implementation for the non-statically known
case out-of-line, which should reduce code size.
The numbers are very encouraging, with the new implementation being 74%
(i.e. almost 4x) faster than the old one. I measured this by looking at
the total time reported by `+RTS -s` for the attached
`InlineCloneArrayAlloc` benchmark.
Here are the stats from the best out of three runs of the old
implementation:
{{{
1,600,041,120 bytes allocated in the heap
6,504 bytes copied during GC
35,992 bytes maximum residency (1 sample(s))
21,352 bytes maximum slop
1588 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 1 colls, 0 par 0.01s 0.01s 0.0082s
0.0082s
Gen 1 1 colls, 0 par 0.00s 0.11s 0.1062s
0.1062s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.29s ( 0.57s elapsed)
GC time 0.01s ( 0.11s elapsed)
EXIT time 0.01s ( 0.11s elapsed)
Total time 0.31s ( 0.80s elapsed)
%GC time 2.7% (14.2% elapsed)
Alloc rate 5,497,251,856 bytes per MUT second
Productivity 97.3% of total user, 37.4% of total elapsed
}}}
Here are the same for the new implementation:
{{{
1,600,041,120 bytes allocated in the heap
57,224 bytes copied during GC
35,992 bytes maximum residency (1 sample(s))
21,352 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max
pause
Gen 0 3125 colls, 0 par 0.01s 0.01s 0.0000s
0.0000s
Gen 1 1 colls, 0 par 0.00s 0.00s 0.0003s
0.0003s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.08s ( 0.08s elapsed)
GC time 0.01s ( 0.01s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.08s ( 0.09s elapsed)
%GC time 6.4% (8.8% elapsed)
Alloc rate 21,260,179,643 bytes per MUT second
Productivity 93.5% of total user, 88.8% of total elapsed
}}}
The performance ratio between the new and old implementation gets worse
for the old implementation as the iteration count is increased.
There's also an interesting difference in the Gen 1 collection times
between the two implementations.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8885>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list