[Haskell-cafe] Puzzled about inlining and specialise inline under ghc -O2
Rafal Kolanski
xs at xaph.net
Thu Mar 22 15:35:08 CET 2012
Dear Haskell-Cafe,
I'm computing a histogram of a bunch of symbols with up to 8 bits of
information each, stored in a unboxed vector of Word8. The histogram is
represented as an unboxed vector of Int with size 2^bits. I compute the
histogram by folding an increment function.
The problem: depending on what types and what annotations I give to the
increment and histogram function (see below), the GC gets through a
different amount of memory. I'm using GHC 7.0.3 and -O2. I'd like to
better understand how and why the optimisation does/doesn't kick in.
Here are the functions with the most generic types I can think of:
import qualified Data.Vector.Unboxed as UV
import qualified Data.Vector.Unboxed.Mutable as UMV
import Control.Monad.Primitive (PrimMonad, PrimState)
increment :: (PrimMonad m, UMV.Unbox a, Num a, Integral b) =>
UMV.MVector (PrimState m) a -> b -> m (UMV.MVector (PrimState m) a)
increment v x = do
n <- UMV.read v (fromIntegral x)
UMV.write v (fromIntegral x) (n+1)
return v
histogram :: (Integral a, UMV.Unbox a) => Int -> UV.Vector a ->
UV.Vector Int
histogram bitsPerSym v = runST $ do
a <- UMV.replicate (2^bitsPerSym) (0::Int)
a' <- UV.foldM' increment a v
UV.unsafeFreeze a'
Running my test load, I get: total alloc = 33,206,568 bytes
Looking at the core, ghc is not specialising the functions, even if I
tell it to inline them. So let's brutally change the types to be as
specific as I need for my application:
increment :: UMV.MVector s Int -> Word8 -> ST s (UMV.MVector s Int)
histogram :: Int -> UV.Vector Word8 -> UV.Vector Int
result: total alloc = 19,581,152 bytes
and if I put INLINE pragmas for both functions: 16,952,512 bytes
I should be able to achieve the same effect with SPECIALISE INLINE
pragmas, right? Let's try that:
{-# SPECIALISE INLINE increment :: UMV.MVector s Int -> Word8 -> ST s
(UMV.MVector s Int) #-}
{-# SPECIALISE INLINE histogram :: Int -> UV.Vector Word8 -> UV.Vector
Int #-}
result: 33,139,856 bytes
(GHC can't figure out application of the first rule, giving:
Warning: RULE left-hand side too complicated to desugar)
So unfortunately my most generic form won't work here, I need to
specialise increment to be in ST (which sucks, because I want it to work
for both IO and ST):
increment :: (UMV.Unbox a, Num a, Integral b) =>
UMV.MVector s a -> b -> ST s (UMV.MVector s a)
{-# SPECIALISE INLINE increment :: UMV.MVector s Int -> Word8 -> ST s
(UMV.MVector s Int) #-}
result: 17,016,192 bytes
This is very close to the most specific function instantiations and
INLINE, but:
- I've lost being generic between ST and IO
- it's still a little bigger than the specific instances + INLINE
So my questions are: what is going on? Can I have genericity between ST
and IO while keeping the low GC usage? How come SPECIALISE INLINE does
not give the same result as specific instances + INLINE?
Obviously, for this example, I don't really *need* increment to work
inside IO, since I'm using runST... but I want to understand what
is going on. Profiling Haskell performance and memory usage has always
been difficult for me.
Much thanks in advance,
Rafal Kolanski.
More information about the Haskell-Cafe
mailing list