[Haskell-cafe] haskell crypto is reaaaaaaaaaally slow

Donald Bruce Stewart dons at cse.unsw.edu.au
Wed Jun 20 00:54:09 EDT 2007


dons:
> aeyakovenko:
> > $ time ./md5sum ./md5sum
> > [105,252,52,138,187,192,216,17,225,123,185,3,124,101,86,132]
> > 
> > real    0m4.790s
> > user    0m3.688s
> > sys     0m0.492s
> > 
> > $ time md5sum ./md5sum
> > 69fc348abbc0d811e17bb9037c655684  ./md5sum
> > 
> > real    0m0.023s
> > user    0m0.000s
> > sys     0m0.008s
> > 
> > this is my implementation using crypto
> > (http://www.haskell.org/crypto/).  Am I doing something wrong?
> > 
> > module Main where
> > 
> > import System
> > import qualified Data.Digest.MD5 as MD5
> > import qualified Data.ByteString as BS
> > 
> > main = do
> >   args <- getArgs
> >   dt <- BS.readFile $ head args
> >   putStrLn $ show $ MD5.hash . BS.unpack $ dt
>                ^^^^              ^^^^^^^^^
> 
>                                  not a good idea.
> 
> You need an MD5 over bytestrings, not [Word8].
> 
> -- Don

I note a couple of other issues:

    crypto is compiled with:
    
        Ghc-options:     -fglasgow-exts

    that is, no optimisations, although it would certainly benefit from

        Ghc-options: -O2 -fexcess-precision -funbox-strict-fields

So I'd recompile the crypto package with that first.
Then be sure to compile your code with ghc -O2  -- bytestrings love -O2.

Finally, to actually get C speed, use a C md5. Here's an example Haskell
binding to the OpenSSL libraries 'md5' function, which you can compile
and run like so:

    $ ghc MD5.hs -lcrypto -o hsmd5 

    $ time ./hsmd5 /usr/share/dict/words
    MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3 

    ./hsmd5 /usr/share/dict/words  0.01s user 0.02s system 80% cpu 0.029 total

versus my system's 'md5' program:

    $ time md5 /usr/share/dict/words 
    MD5 (/usr/share/dict/words) = e5c152147e93b81424c13772330e74b3
    md5 /usr/share/dict/words  0.02s user 0.00s system 29% cpu 0.052 total

Oh huh, that's interesting...

And the code:

    {-# OPTIONS -O2 -fffi -#include "openssl/md5.h" #-}

    --
    -- A few imports, should tidy these up one day.
    --
    import System.Environment
    import qualified Data.ByteString.Base as B (unsafeUseAsCStringLen)
    import qualified Data.ByteString      as B
    import Foreign
    import Foreign.C.Types
    import Numeric
    import Text.Printf


    main = do
        (f:_)  <- getArgs
        src    <- B.readFile f
        printf "MD5 (%s) = %s \n" f (md5sum src)


    -- ---------------------------------------------------------------------
    --
    -- Fast md5 using OpenSSL and zero-copying bytestrings
    --

    --
    -- The md5 hash should be referentially transparent..
    --
    md5sum :: B.ByteString -> String
    md5sum p = unsafePerformIO $ B.unsafeUseAsCStringLen p $ \(ptr,n) -> do
        digest  <- c_md5 ptr (fromIntegral n) nullPtr
        go digest 0 []
      where

        -- print it in 0-padded hex format
        go :: Ptr Word8 -> Int -> [String] -> IO String
        go p n acc
            | n >= 16   = return $ concat (reverse acc)
            | otherwise = do w <- peekElemOff p n
                             go p (n+1) (draw w : acc)

        draw w = case showHex w [] of
                    [x] -> ['0', x]
                    x   -> x

    -- unsigned char *MD5(const unsigned char *d, unsigned long n, unsigned char *md);
    --
    foreign import ccall "openssl/md5.h MD5" c_md5
        :: Ptr CChar -> CULong -> Ptr CChar -> IO (Ptr Word8)


Happy hacking.

-- Don


More information about the Haskell-Cafe mailing list