[Haskell-cafe] Performance: MD5

Don Stewart dons at galois.com
Tue May 20 14:57:34 EDT 2008


andrewcoppin:
> Salvatore Insalaco wrote:
> >Hi Andrew,
> >just a profiling suggestion: did you try to use the SCC cost-centre
> >annotations for profiling?
> >If you want to know precisely what takes 60% of time, you can try:
> >        bn = {-# SCC "IntegerConversion" #-} 4 * fromIntegral wn
> >        b0 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+0)
> >        b1 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+1)
> >        b2 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+2)
> >        b3 = {-# SCC "ByteStringIndexing" #-} B.index bi (bn+3)
> >        w  = foldl' (\w b -> shiftL w 8 .|. fromIntegral b) 0 
> >        [b3,b2,b1,b0]
> >      in {-# SCC "ArrayWriting" #-} unsafeWrite temp wn w
> >
> >In profiling the time of all expressions with the same SCC "name" 
> >will
> >be summed.
> >You can get more information about SCC here:
> >http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html#cost-centres
> >  
> 
> OK, I'll give that a try...
> 
> >One advice: I've seen various md5sum implementations in C, all using
> >about the same algorithms and data structures, and they performed 
> >even
> >with 10x time differences between them; md5summing fast is not a
> >really simple problem. If I were you I would drop the comparison with
> >ultra-optimized C and concentrate on "does my
> >high-level-good-looking-super-readable implementation perform
> >acceptably?".
> >
> >What "acceptably" means is left as an exercise to the reader.
> >  
> 
> Well, so long as it can hash 500 MB of data in a few minutes without 
> using absurd amounts of RAM, that'll do for me. ;-)
> 
> [I actually wanted to do this for a project at work. When I 
> discovered that none of the available Haskell implementations are 
> fast enough,

How hard did you look?
    
    import System.Environment
    import Data.Digest.OpenSSL.MD5
    import System.IO.Posix.MMap

    main = do
        [f] <- getArgs
        putStrLn . md5sum =<< mmapFile f

Take the md5 of a 600M file:

    $ time ./A /home/dons/tmp/600M  
    24a04fdf3f629a42b5baed52ed793a51
    ./A /home/dons/tmp/600M  3.61s user 1.65s system 20% cpu 25.140 total

Easy.


-- Don


More information about the Haskell-Cafe mailing list