[Haskell-cafe] GHC Optimization seriously hurting performance by unsharing.
mkarcher at mkarcher.dialup.fu-berlin.de
Tue Apr 8 05:55:38 UTC 2014
Hello Haskell folks,
Can you give me a hint whether the GHC optimizer behaviour demonstrated
below should be considered a bug? If it is no bug, what should I keep in
mind to make sure not to hit that trap again? This posting is (obviously)
formatted as literal Haskell file, so feel free to try.
This is a case where -O2 completely kills performance, because it unshares
an object that should be shared. I stumbled across this behaviour of
GHC 7.6.3 when trying to solve RMQ from hackerrank.
Lets start with a typical header, the reason for the IORef module is
> module Main where
> import Data.Set (member, fromAscList)
> import Control.Monad (replicateM_)
> import Data.IORef
Now lets get to the code.
> main = do
The original problem statement requested to read a set of numbers from
the standard input. To have this testcase self-contained, we use a "virtual
file" instead, which is an (IORef String). The key point is that we need to
get the string with the space-sperated test element from an IO operation,
whereas it does not matter whether it is "getLine" or "readIORef"
> ior <- newIORef $ unwords $ map show $ [1..20000]
Now read the space-separated test input from an IO action.
> testdata <- fmap (map read.words) $ readIORef ior
and build a set from the number array we obtained to speed up membership
queries. You don't want to search 20.000 numbers on each iteration.
> let bigset = fromAscList testdata
finally query the membership of the number 5 several times. This happens
to be reasonably fast if compiled without optimization, but goes down to
dog slow if compiled with optimization, as the set building process is
hoisted into the loop and thus performed 10 times.
> replicateM_ 10 . print . member 5 $ bigset
this is the solution (or the workaround) to the problem. If you use this
line, things get fast, because the loop gets created after evaluting the
set to WHNF, thus forcing sharing
replicateM_ 10 . print . member 5 $! bigset
Enabling both of the replicateM_ lines also speeds up the first one, as
the use of bigset in both lines prevents hoisting set construction into
the first loop.
More information about the Haskell-Cafe