Unsafe aspects of ByteString

Donald Bruce Stewart dons at cse.unsw.edu.au
Mon Jan 29 22:08:50 EST 2007


iavor.diatchki:
> Hello,
> The "packCString" function (and other similar functions) in the
> ByteString library break referential transperancy, which is one of the
> big selling points of Haskell (and its libraries).

The Data.ByteString functions relating to CString have now been modified
as follows, in the darcs repository. These changes will be propogated
into base in due course.

Public CString functions:

  Data.ByteString:
  
      packCString     :: CString    -> IO ByteString
      packCStringLen  :: CStringLen -> IO ByteString
  
      useAsCString    :: ByteString -> (CString    -> IO a) -> IO a
      useAsCStringLen :: ByteString -> (CStringLen -> IO a) -> IO a

These are safe, copying functions. Never can modifying the CString affect the
Haskell ByteString, or any substrings of it.


Private, unsafe functions, only available by importing Data.ByteString.Base:
      
  Dangerous, efficient api, suitable for constant CStrings only (the CString functions
  may also require null termination):
  
      unsafeUseAsCString      :: ByteString -> (CString -> IO a) -> IO a
      unsafeUseAsCStringLen   :: ByteString -> (CStringLen -> IO a) -> IO a
  
      unsafePackCString       :: CString    -> IO ByteString
      unsafePackCStringLen    :: CStringLen -> IO ByteString
      unsafePackMallocCString :: CString    -> IO ByteString

The documentation has also been extensively revised. In particular, all unsafe
functions contain text explaining in what way they are unsafe. For example:

    unsafeUseAsCString :: ByteString -> (CString -> IO a) -> IO a
    O(1) construction Use a ByteString with a function requiring a CString.

    This function does zero copying, and merely unwraps a ByteString to appear as a CString. It is
    unsafe in two ways:

      * After calling this function the CString shares the underlying byte
    buffer with the original ByteString. Thus modifying the CString, either in
    C, or using poke, will cause the contents of the ByteString to change,
    breaking referential transparency. Other ByteStrings created by sharing
    (such as those produced via take or drop) will also reflect these changes.
    Modifying the CString will break referential transparency. To avoid this,
    use useAsCString, which makes a copy of the original ByteString.

      * CStrings are often passed to functions that require them to be
    null-terminated. If the original ByteString wasn't null terminated, neither
    will the CString be. It is the programmers responsibility to guarantee that
    the ByteString is indeed null terminated. If in doubt, use useAsCString.


The plain old Data.ByteString CString api should now be safe from FFI
manipulation. Note that Iavor's original demo looks like:

    import qualified Data.ByteString as B
    import Data.ByteString (packCString)
    import Foreign.C.String
    import Foreign

    main = do x <- newCString "Hello"
              s <- packCString x
              let h1  = B.head s
              print s
              poke x (toEnum 97)
              print s
              let h2 = B.head s
              print h1
              print h2

And now produces:

    $ runhaskell iavor.hs
    "Hello"
    "Hello"
    72
    72

No more ghostly telekinesis from the CString side!

Thanks to everyone for feedback and criticism.

-- Don


More information about the Libraries mailing list