Use of H98 FFI

Peter Thiemann thiemann@informatik.uni-freiburg.de
01 Aug 2003 09:44:14 +0200


--=-=-=

I recently had my first exposure to Haskell's FFI when I was trying to
compute MD5 and SHA1 hashes using the existing C implementations. In
each case, the idea is to make the hash function available as function

> md5 :: String -> String

However, the naive implementation

>     md5_init md5_state
>     n <- newCString str
>     md5_append md5_state n (fromIntegral (length str))
>     md5_finish md5_state md5_digest

does not scale to computing hashes of really long strings (50 MB, say,
as arising from reading a moderately big file), since it tries to
create a CString of that size, first! 

Trying to avoid the allocation of this giant CString requires to split
up the original string into smaller parts and convert each part to a
CString separately. Clearly, this task involves a lot of allocation,
essentially the input string needs to be copied part by part.

Hence, I was wondering why the FFI only provides functionality to
convert an *entire* list of Char into a CString. For applications like
this hash computation, it would be advantageous to be able to specify
*how much* of the input string to marshall to the CString and have the
conversion function return the rest of the input string and the
CString. That is, in addition to 

> newCString :: String -> IO CString

there should be

> newCStringPart :: String -> Int -> IO (CStringLen, String)

or even

> toCStringPart :: String -> CStringLen -> IO (Int, String)

where CStringLen describes a target buffer into which the String
argument is to be marshalled.  (and similarly for other list types)

Clearly, I can program this functionality by hand. But I have to
revert to byte-wise processing using pokeByteOff, castCharToCChar, and
so on. In addition, the optimizer does not seem to be very effective on
such code, so it seems advantageous to provide it in the library
already.

But perhaps I'm overlooking something, so I'm appending the code I was
using below.

-Peter


--=-=-=
Content-Type: application/octet-stream
Content-Disposition: attachment; filename=MD5.hs
Content-Description: three interfaces to MD5

module MD5 where

import Int
import Ptr
import MarshalAlloc
import CString
import CTypes
import IOExts
import Storable (pokeByteOff)

type MD5_STATE = ()
type MD5_DIGEST = Ptr CChar

foreign import ccall "md5.h md5_init"
  md5_init   :: Ptr MD5_STATE -> IO ()
foreign import ccall "md5.h md5_append" 
  md5_append :: Ptr MD5_STATE -> Ptr CChar -> Int32 -> IO ()
foreign import ccall "md5.h md5_finish"
  md5_finish :: Ptr MD5_STATE -> MD5_DIGEST -> IO ()

-- |compute MD5 hash in one go
md5small :: String -> String
md5small str =
  unsafePerformIO $
  do md5_state <- mallocBytes (2*4 + 4*4 + 64)
     md5_digest <- mallocBytes 16
     md5_init md5_state
     n <- newCString str
     md5_append md5_state n (fromIntegral (length str))
     md5_finish md5_state md5_digest
     peekCStringLen (md5_digest, 16)

-- |compute MD5 hash by peeling off parts of 512 characters
md5sum :: String -> String
md5sum str =
  unsafePerformIO $
  do md5_state <- mallocBytes (2*4 + 4*4 + 64)
     md5_digest <- mallocBytes 16
     md5_init md5_state
     let loop s = 
	   case splitAt 512 s of
	     (xs, ys) ->
	       do n <- newCString xs
		  md5_append md5_state n (fromIntegral (length xs))
		  case ys of
		    [] -> return ()
		    _  -> loop ys
     loop str
     md5_finish md5_state md5_digest
     peekCStringLen (md5_digest, 16)

-- |compute MD5 hash byte-wise
md5mus :: String -> String
md5mus str =
  unsafePerformIO $
  do md5_state <- mallocBytes (2*4 + 4*4 + 64)
     md5_digest <- mallocBytes 16
     stringbuffer <- mallocBytes 512
     md5_init md5_state
     let loop [] i = 
	   if i==0 then
	     return ()
	   else
	     md5_append md5_state stringbuffer (fromIntegral i)
	 loop (x:xs) i =
	   let cx = castCharToCChar x in
	   do pokeByteOff stringbuffer i cx
	      next xs (i+1)
	 next xs i =
	   if i==512 then
	     do md5_append md5_state stringbuffer 512
		loop xs 0
	   else loop xs i
     loop str 0
     md5_finish md5_state md5_digest
     peekCStringLen (md5_digest, 16)

--=-=-=--