CStrings

Tue Nov 28 12:35:41 EST 2000

On Tue, 28 Nov 2000 malcolm-ffi at cs.york.ac.uk wrote:

> What do you mean by "absent"?  I thought you had a proposal.  I was
> just asking for it to be recognised as a proposal rather than a
> standard, because we haven't had any experience with using it yet.

Ok.

> Just by looking at the names and type signatures, I'm not even
> sure what each proposed function does.  Could you describe them in
> a little more detail?

Sorry, I thought it was clear from names and types.

  peekCString    :: Ptr CChar -> IO String
  -- Read a null-terminated string of C chars from memory and convert
  -- it to Haskell string. The C string is assumed to be in the default
  -- local byte encoding.
  peekCStringLen :: Ptr CChar -> Int -> IO String
  -- Similarly, but read the specified number of chars instad of searching
  -- for '\0'.

  withCString    :: String -> (Ptr CChar -> IO a) -> IO a
  -- Convert a Haskell string to the null-terminated C string in the
  -- default local byte encoding, stored in some temporary memory.
  -- Apply the function to the pointer to that string, execute the
  -- resulting action and then free the string.
  withCStringLen :: String -> (Ptr CChar -> Int -> IO a) -> IO a
  -- Similarly, but tells about the length of the converted string,
  -- not including the final '\0'. This allows converting strings
  -- containing null characters.

  newCString     :: String -> IO (Ptr CChar)
  newCStringLen  :: String -> IO (Ptr CChar, Int)
  -- Like withCString*, but the memory is obtained by malloc. It must be
  -- explicitly freed using free.

> My request for a portable implementation is because, now that the
> basic common FFI is settled, I believe we should be using it to ensure
> that new proposals such as yours are not restricted to GHC-only,
> like so many existing libraries sadly are.

The C string stuff can be written portably in Haskell implementations
which use the same character encoding as C instead of Unicode. That
implementation will not be valid when Strings really use Unicode.

The generic interface for charset conversions is not established yet. I
have a proposal for it, but there is much room for choice there. Having
such interface it is possible to write the C string functions in an
inefficient way. My proposal for charsets allows having more efficient C
string functions than can be written using its official interface
(avoiding going through Haskell lists of CChars or Bytes). In practice the
implementations of Unicode stuff and C string stuff are mutually
dependent.

-- 
Marcin 'Qrczak' Kowalczyk