Inliner behaviour - tiny changes lead to huge performance
differences
Simon Marlow
marlowsd at gmail.com
Fri Nov 13 08:43:34 EST 2009
On 13/11/2009 07:04, Bryan O'Sullivan wrote:
> I'm working on measuring and improving the performance of the text
> library at the moment, and the very first test I tried demonstrated a
> piece of behaviour that I'm not completely able to understand. Actually,
> I'm not able to understand what's going on at all, beyond a very shallow
> level. All the comments below pertain to GHC 6.10.4.
>
> The text library uses stream fusion, and I want to measure the
> performance of UTF-8 decoding.
>
> The code I'm measuring is very simple:
>
> import qualified Data.ByteString as B
> import Data.Text.Encoding as T
> import qualified Data.Text as T
> import System.Environment (getArgs)
> import Control.Monad (forM_)
>
> main = do
> args <- getArgs
> forM_ args $ \a -> do
> s <- B.readFile a
> let t = T.decodeUtf8 s
> print (T.length t)
>
>
> The streamUtf8 function looks roughly like this:
>
> streamUtf8 :: OnDecodeError -> ByteString -> Stream Char
> streamUtf8 onErr bs = Stream next 0 (maxSize l)
> where
> l = B.length bs
> next i
> | i >= l = Done
> | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1)
> | {- etc. -}
> {-# INLINE [0] streamUtf8 #-}
>
>
> The values being Yielded from the inner function are, as you can see,
> themselves constructed by functions.
>
> Originally, with the inner next function manually marked as INLINE, I
> found that functions like unsafeChr8 were not being inlined by GHC, and
> performance was terrible due to the amount of boxing and unboxing
> happening in the inner loop.
>
> I somehow stumbled on the idea of removing the INLINE annotation from
> next, and performance suddenly improved by a significant integer
> multiple. This caused the body of streamUtf8 to be inlined into my test
> program, as I hoped.
I think I can explain this one, at least partially. When you mark a
function INLINE, GHC does not optimise the body of the function itself,
on the grounds that it will be inlined at the call site anyway and get
optimised there.
Simon just changed this behaviour in GHC 6.12, so that GHC now keeps the
original definition for inlining, but also optimises the original
function as normal, which is useful if we can't or don't want to inline
at a call site for some reason (or indeed if the calling module is being
compiled without -O!).
What we still don't understand, however, is why streamUtf8 was not being
inlined at the call site. That is the root of the problem. We'll need
to look more closely at the call site to understand what's going on.
Cheers,
Simon
(PS if any of what I said contradicts what Simon and/or Roman said, then
please ignore me and not them :-)
More information about the Glasgow-haskell-users
mailing list