Inlining question
Ian Lynagh
igloo at earth.li
Sun Apr 3 09:38:30 EDT 2005
Hi all,
With foo.hs below, if I compile normally then it takes about 70 seconds
to run:
$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo
$ time ./foo
real 1m10.266s
user 1m9.698s
sys 0m0.521s
If I turn up the inlining threshold then it takes only about 13 seconds:
$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo -funfolding-use-threshold=20
$ time ./foo
real 0m13.313s
user 0m12.838s
sys 0m0.450s
However, if I copy the definition of shift from base/GHC/Word.hs then it
also takes around 13 seconds:
$ rm -f *.o *.hi foo
$ ghc -cpp -Wall -O2 foo.hs -o foo -DCOPY -fglasgow-exts
$ time ./foo
real 0m13.394s
user 0m12.843s
sys 0m0.454s
Why does it matter whether the definition is in the current file or is
imported from the standard libraries?
Thanks
Ian
module Main (main) where
import Foreign.Ptr (Ptr)
import Foreign.Marshal.Array (mallocArray, advancePtr)
import Foreign.Storable (peek, poke)
import Data.Bits ((.|.))
#ifdef COPY
import Prelude hiding (Int)
import GHC.Exts (shiftRL#, shiftL#)
import GHC.Word (Word32(W32#))
import GHC.Base (Int(I#), narrow32Word#, negateInt#, (>=#))
#else
import Data.Word (Word32)
import Data.Bits (shift)
#endif
main :: IO ()
main = do p <- mallocArray 104857600
mapM_ (\_ -> foo p 104857600) [1..10 :: Int]
foo :: Ptr Word32 -> Int -> IO ()
foo p i | p `seq` i `seq` False = undefined
foo _ 0 = return ()
foo p n
= do x <- peek p
poke p (shift x (-1) .|. shift x (-2) .|. shift x (-3) .|. shift x (-4))
foo (p `advancePtr` 1) (n - 1)
#ifdef COPY
-- Defn from libraries/base/GHC/Word.hs
shift :: Word32 -> Int -> Word32
shift (W32# x#) (I# i#)
| i# >=# 0# = W32# (narrow32Word# (x# `shiftL#` i#))
| otherwise = W32# (x# `shiftRL#` negateInt# i#)
#endif
More information about the Glasgow-haskell-users
mailing list