Inlining question

Ian Lynagh igloo at
Sun Apr 3 09:38:30 EDT 2005

Hi all,

With foo.hs below, if I compile normally then it takes about 70 seconds
to run:

$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo
$ time ./foo
real    1m10.266s
user    1m9.698s
sys     0m0.521s

If I turn up the inlining threshold then it takes only about 13 seconds:

$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo -funfolding-use-threshold=20
$ time ./foo
real    0m13.313s
user    0m12.838s
sys     0m0.450s

However, if I copy the definition of shift from base/GHC/Word.hs then it
also takes around 13 seconds:

$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo -DCOPY -fglasgow-exts
$ time ./foo
real    0m13.394s
user    0m12.843s
sys     0m0.454s

Why does it matter whether the definition is in the current file or is
imported from the standard libraries?


module Main (main) where

import Foreign.Ptr (Ptr)
import Foreign.Marshal.Array (mallocArray, advancePtr)
import Foreign.Storable (peek, poke)
import Data.Bits ((.|.))
#ifdef COPY
import Prelude hiding (Int)
import GHC.Exts (shiftRL#, shiftL#)
import GHC.Word (Word32(W32#))
import GHC.Base (Int(I#), narrow32Word#, negateInt#, (>=#))
import Data.Word (Word32)
import Data.Bits (shift)

main :: IO ()
main = do p <- mallocArray 104857600
          mapM_ (\_ -> foo p 104857600) [1..10 :: Int]

foo :: Ptr Word32 -> Int -> IO ()
foo p i | p `seq` i `seq` False = undefined
foo _ 0 = return ()
foo p n
 = do x <- peek p
      poke p (shift x (-1) .|. shift x (-2) .|. shift x (-3) .|. shift x (-4))
      foo (p `advancePtr` 1) (n - 1)

#ifdef COPY
-- Defn from libraries/base/GHC/Word.hs
shift :: Word32 -> Int -> Word32
shift (W32# x#) (I# i#)
 | i# >=# 0# = W32# (narrow32Word# (x# `shiftL#` i#))
 | otherwise = W32# (x# `shiftRL#` negateInt# i#)

More information about the Glasgow-haskell-users mailing list