[Haskell-cafe] Performance: MD5

Thomas M. DuBuisson thomas.dubuisson at gmail.com
Tue May 20 19:40:15 EDT 2008


Andrew,
What is "fast enough"?  My informal understanding of the implementations
are:

1) Unoptimized [Word8] implementations (constant space, two or three
orders of magnatude slower than C)
2) Bindings to 'C' routines (typically O(n) space due to strict
ByteStrings and no split out of initialContext/md5Update/md5Finialize)
3) Native Lazy.ByteString implementation (perhaps more than one?) ~3-6
times slower than 'C'.

Tom

On Tue, 2008-05-20 at 19:44 +0100, Andrew Coppin wrote:
> Salvatore Insalaco wrote:
> > Hi Andrew,
> > just a profiling suggestion: did you try to use the SCC cost-centre
> > annotations for profiling?
> > If you want to know precisely what takes 60% of time, you can try:
> >         bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
> >         b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
> >         b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
> >         b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
> >         b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
> >         w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0 [b3,b2,b1,b0]
> >       in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w
> >
> > In profiling the time of all expressions with the same SCC "name" will
> > be summed.
> > You can get more information about SCC here:
> > http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres
> >   
> 
> OK, I'll give that a try...
> 
> > One advice: I've seen various md5sum implementations in C, all using
> > about the same algorithms and data structures, and they performed even
> > with 10x time differences between them; md5summing fast is not a
> > really simple problem. If I were you I would drop the comparison with
> > ultra-optimized C and concentrate on "does my
> > high-level-good-looking-super-readable implementation perform
> > acceptably?".
> >
> > What "acceptably" means is left as an exercise to the reader.
> >   
> 
> Well, so long as it can hash 500 MB of data in a few minutes without 
> using absurd amounts of RAM, that'll do for me. ;-)
> 
> [I actually wanted to do this for a project at work. When I discovered 
> that none of the available Haskell implementations are fast enough, I 
> tried to write my own. But that wasn't fast enough either. So I ended up 
> being forced to call md5sum.exe and attempt to parse its output. 
> Obviously having a real Haskell function would be infinitely more 
> portable and convinient...]
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list