[GHC] #8793: Improve GHC.Event.IntTable performance
GHC
ghc-devs at haskell.org
Sun Feb 16 03:00:17 UTC 2014
#8793: Improve GHC.Event.IntTable performance
------------------------------------+-------------------------------------
Reporter: cdk | Owner:
Type: task | Status: new
Priority: normal | Milestone: 7.8.1
Component: libraries/base | Version: 7.6.3
Keywords: | Operating System: Unknown/Multiple
Architecture: Unknown/Multiple | Type of failure: None/Unknown
Difficulty: Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: |
------------------------------------+-------------------------------------
The performance of `GHC.Event.IntTable` can be improved. I've managed to
get some nice increases across the board. Benchmarking using `criterion`
shows:
function, % faster than current impl.
`insert`: 4%
`lookup`: 26%
`update`: 11%
`delete`: 5%
There is one strange thing I noted. In `updateWith`, there is an inner
loop that looks like this:
{{{
data Bucket a = Empty | Bucket Int a (Bucket a)
go _ Empty = (False, Nothing, Empty)
go cont (Bucket key val next)
| key == k = case f val of
Nothing -> (True, Just val, cont next)
Just v -> (False, Just val, cont (Bucket key v next))
| otherwise = go (\x -> cont (Bucket key val x)) next
}}}
which returns a tuple that is immediately consumed like so:
{{{
(delete_occurred, old_val, new_bkt) <- go id ...
when (isJust old_val) $ do
<updateIntTable>
when delete_occurred <decIntTableSize>
return old_val
}}}
I expected that inlining the `<updateIntTable>` and `<decIntTableSize>`
code blocks directly into `go` would result in better code than creating a
tuple and then pattern matching on it afterwards. ie.
{{{
go _ Empty = return Nothing
go cont (Bucket key val next)
| key == k = do
case f val of
Nothing -> <updateIntTable> (cont next) >> <decIntTableSize>
Just v -> <updateIntTable> (cont (Bucket key v next)
return (Just val)
| otherwise = go (\x -> cont (Bucket key val x)) next
}}}
which has the exact same semantics. To my suprise, this code is almost 2x
slower! The core generated in both cases is exactly what I'd expect; if
anything, the second version seems tighter. I'm not sure why the first
version is faster, but perhaps the original author, Bryan O'Sullivan, can
shed some light as he used the tupled method in the first version.
I'll attach my patch, `criterion`'s html output for the benchmarks as well
as the benchmarking code, and the core for the oddity I discussed above.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8793>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list