[Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Salvatore Insalaco kirby81 at gmail.com
Tue May 20 05:41:32 EDT 2008


2008/5/17 Andrew Coppin <andrewcoppin at btinternet.com>:
> Hi folks.
>
> OK, try this:
>
>  darcs get http://darcs.orphi.me.uk/MyMD5
>  cd MyMD5
>  ghc -O2 --make md5sum
>  md5sum <some large filename>

I've got some time to take a look at the code. It's very nice,
readable and declarative, but obviously not optimized for "raw speed".
There're a few things to do to have a speed-up of 2x, without going "low-level".

1) There's a lazyness leak in Pad.hs. The sum in:
          else make_block_bs bs0 : work (n + block_size_bytes) bs1
is not strict. With a very large file (e.g. 100 mb) it means stack overflow.

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs

[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize' to increase it.

To solve it just add a bang (!) before the n parameter:

    work !n bs =

You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
too. There're solutions that don't imply BangPatterns, and use only
seq, but I like bang patterns!

Now it works:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
11.958u 0.210s 0:12.19 99.7%	0+0k 0+2io 0pf+0w

2) make_block_bs is sub-optimal, and very critical to performance. I
decided to use Data.Binary for it (it's also more readable, and you
get rid of the unsafeWrite):
import Data.Binary
import Data.Binary.Get
import Control.Monad

// ...

instance Binary Block where
    put _ = undefined
    get = do
        xs <- replicateM 16 getWord32le
        return $ Block $ listArray (0, 15) xs

make_block_bs :: B.ByteString -> Block
make_block_bs = decode

Now we are getting faster:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
8.063u 0.174s 0:08.23 100.0%	0+0k 0+0io 0pf+0w

3) You are doing a lot of access to fields of a strict data type
(State). You can at least ask the compiler to help you a bit with
-funbox-strict-fields.

Now we have:
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
E6542028D538BAEC180A96A5D1E6EC3A
6.780u 0.133s 0:06.91 100.0%	0+0k 0+1io 0pf+0w

We have got a good, nearly 2x, speed-up with very few optimizations,
and we run in a very small constant amount of memory.

Probably compiling with -fvia-C could help even more, but strangely:

[roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
-O2 --make md5sum.hs
[roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
Segmentation fault

Is this supposed to happen? There're no "unsafe" functions or imports
used in the program. Maybe a bug in GHC?

Salvatore


More information about the Haskell-Cafe mailing list