Speed of simple operations with Ptr Word32s
Donald Bruce Stewart
dons at cse.unsw.edu.au
Sat Dec 4 20:16:39 EST 2004
wolfgang.thaller:
> Ian Lynagh wrote:
> >Hi all,
> >
> >I was under the impression that simple code like the below, which swaps
> >the endianness of a block of data, ought to be near C speed:
> >
> >[...]
> > poke p (shiftL x 24 .|. shiftL (x .&. 0xff00) 8
> > .|. (shiftR x 8 .&. 0xff00)
> > .|. shiftR x 24)
> >[...]
>
> The problem here is that the shiftL and shiftR operations don't get
> inlined properly. They get replaced by a call to shift, but that
> doesn't get inlined.
> The shift function also wastes some more time by checking the sign of
> the shift amount.
> A few well-placed INLINE pragmas in the libraries might help.
>
> >Is there anything I can do to get better performance in this sort of
> >code without resorting to calling out to C?
>
> You could import some private GHC modules and use the primop directly:
>
> import GHC.Prim
> import GHC.Word
>
> main :: IO ()
> main = do p <- mallocArray 104857600
> foo p 104857600
>
> shiftL (W32# a) (I# b) = W32# (shiftL# a b)
> shiftR (W32# a) (I# b) = W32# (shiftRL# a b)
>
> Using those instead of the standard ones speeds up the program a lot;
> be aware however that you shouldn't use negative shift amounts with
> those (undefined result, no checking).
Rewriting the bit twiddling code to use the unboxed primops will
generate almost identical C code (in GHC, at least) to that which you
would write yourself. You'll actually get the C ops generated if you do
all the operations on Word#, instead of Int#:
Here's a bit of ridiculously optimised code in yi that I wrote to look
at how to get straight C code out of the backend:
(I# (word2Int# ((int2Word# i `and#` int2Word# 0xffff#)
`remWord#` int2Word# 52#)))
This code is literally compiled to (-ddump-realC):
...
_s6hk_ = (StgWord)((I_)(R1.p[1]));
_s6hn_ = _s6hk_ & 0xffff;
_s6hq_ = _s6hn_ % 0x34;
_s6ht_ = (StgInt)(_s6hq_);
...
-- Don
More information about the Glasgow-haskell-users
mailing list