[Haskell-cafe] Performance: MD5
Andrew Coppin
andrewcoppin at btinternet.com
Sun May 18 07:42:23 EDT 2008
Andrew Coppin wrote:
> If anybody has any interesting insights as to why my version is so
> damned slow, I'd like to hear them.
OK, so I am now officially talking to myself.
Profiling indicates that 60% of my program's time is spent inside one
function:
make_block_bs :: ByteString -> Block
In other words, the program spends 60% of its running time converting an
array of Word8 into an array of Word32. [Note that the MD5 algorithm
incorrectly specifies that little-endian ordering should be used, so I
do the byte-swapping in here too.] The code for breading the file into
correctly-sized pieces and adding padding is a ful 3% of the runtime, so
it's not even the copying overhead. It's *all* in the array conversions.
How much do you want to bet that the C version just reads a chunk of
data into RAM and then does a simple type-cast? (If I'm not very much
mistaken, in C a type-cast is an O(0) operation.)
Anyway, wading through the 3-page Core output for this function, I'm
seeing a lot of stuff of the form
case GHC.Prim.<=# 0 whuhrjhsd of wild3784_hguhse {
GHC.Base.False ->
(GHC.Err.error @ GHC.Base.Int MD5.Types.lvl
`cast` (CoUnsafe ...
I'd be lying if I had I had any real idea what that does. But it occurs
to me that the code is probably performing runtime checks that every
single array access is within bounds, and that every single bit shift
operation is similarly within bounds. This is obviously very *safe*, but
possibly not very performant.
Isn't there some function kicking around somewhere for acessing arrays
without the bounds check? Similarly, I'm sure I remember the Data.Bits
bounds check being mentioned before, and somebody saying they had [or
were going to?] make an unsafe variant available. I've looked around the
library documentation and I'm not seeing anything... any hints?
(Obviously I could attempt to dig through GHC.Prim and find what I'm
after there, but... it looks pretty scary in there! I'm hoping for
something slightly higher-level if possible.)
The alternative, obviously, is to just perform an evil type cast.
Trouble is, that'll work just fine on any architecture with a
little-endian data order [conspicuously IA32 and AMD64], and malfunction
horribly on any architecture that works correctly. (At which point I
start wondering how C manages to overcome this problem...)
More information about the Haskell-Cafe
mailing list