Use of H98 FFI

Derek Elkins ddarius@hotpop.com
Fri, 1 Aug 2003 04:23:16 -0400


On 01 Aug 2003 09:44:14 +0200
Peter Thiemann <thiemann@informatik.uni-freiburg.de> wrote:

> I recently had my first exposure to Haskell's FFI when I was trying to
> compute MD5 and SHA1 hashes using the existing C implementations. In
> each case, the idea is to make the hash function available as function
> 
> > md5 :: String -> String
> 
> However, the naive implementation
> 
> >     md5_init md5_state
> >     n <- newCString str
> >     md5_append md5_state n (fromIntegral (length str))
> >     md5_finish md5_state md5_digest
> 
> does not scale to computing hashes of really long strings (50 MB, say,
> as arising from reading a moderately big file), since it tries to
> create a CString of that size, first! 
> 
> Trying to avoid the allocation of this giant CString requires to split
> up the original string into smaller parts and convert each part to a
> CString separately. Clearly, this task involves a lot of allocation,
> essentially the input string needs to be copied part by part.
> 
> Hence, I was wondering why the FFI only provides functionality to
> convert an *entire* list of Char into a CString. For applications like
> this hash computation, it would be advantageous to be able to specify
> *how much* of the input string to marshall to the CString and have the
> conversion function return the rest of the input string and the
> CString. That is, in addition to 
> 
> > newCString :: String -> IO CString
> 
> there should be
> 
> > newCStringPart :: String -> Int -> IO (CStringLen, String)
> 
> or even
> 
> > toCStringPart :: String -> CStringLen -> IO (Int, String)
> 
> where CStringLen describes a target buffer into which the String
> argument is to be marshalled.  (and similarly for other list types)
> 
> Clearly, I can program this functionality by hand. But I have to
> revert to byte-wise processing using pokeByteOff, castCharToCChar, and
> so on. In addition, the optimizer does not seem to be very effective
> on such code, so it seems advantageous to provide it in the library
> already.
> 
> But perhaps I'm overlooking something, so I'm appending the code I was
> using below.
> 
> -Peter

Except that I would probably mapM_ over a list of chunks, I don't
see what the problem is with your second version of the code is.