Use of H98 FFI
Derek Elkins
ddarius@hotpop.com
Fri, 1 Aug 2003 04:23:16 -0400
On 01 Aug 2003 09:44:14 +0200
Peter Thiemann <thiemann@informatik.uni-freiburg.de> wrote:
> I recently had my first exposure to Haskell's FFI when I was trying to
> compute MD5 and SHA1 hashes using the existing C implementations. In
> each case, the idea is to make the hash function available as function
>
> > md5 :: String -> String
>
> However, the naive implementation
>
> > md5_init md5_state
> > n <- newCString str
> > md5_append md5_state n (fromIntegral (length str))
> > md5_finish md5_state md5_digest
>
> does not scale to computing hashes of really long strings (50 MB, say,
> as arising from reading a moderately big file), since it tries to
> create a CString of that size, first!
>
> Trying to avoid the allocation of this giant CString requires to split
> up the original string into smaller parts and convert each part to a
> CString separately. Clearly, this task involves a lot of allocation,
> essentially the input string needs to be copied part by part.
>
> Hence, I was wondering why the FFI only provides functionality to
> convert an *entire* list of Char into a CString. For applications like
> this hash computation, it would be advantageous to be able to specify
> *how much* of the input string to marshall to the CString and have the
> conversion function return the rest of the input string and the
> CString. That is, in addition to
>
> > newCString :: String -> IO CString
>
> there should be
>
> > newCStringPart :: String -> Int -> IO (CStringLen, String)
>
> or even
>
> > toCStringPart :: String -> CStringLen -> IO (Int, String)
>
> where CStringLen describes a target buffer into which the String
> argument is to be marshalled. (and similarly for other list types)
>
> Clearly, I can program this functionality by hand. But I have to
> revert to byte-wise processing using pokeByteOff, castCharToCChar, and
> so on. In addition, the optimizer does not seem to be very effective
> on such code, so it seems advantageous to provide it in the library
> already.
>
> But perhaps I'm overlooking something, so I'm appending the code I was
> using below.
>
> -Peter
Except that I would probably mapM_ over a list of chunks, I don't
see what the problem is with your second version of the code is.