Immutable CStrings

George Russell ger@tzi.de
Thu, 24 Apr 2003 11:23:17 +0200


Simon Marlow wrote (snipped)
> Data.PackedString is *almost* what you want, and could be tweaked to do
> the right thing (at least in GHC).

As a matter of fact, my first attempt used Data.PackedString, until my
code fell over because of the hPutPS (or was it hGetPS) bug I reported
recently.

> There are two problems: (1) the representation is currently as an array
> of 32-bit unicode chars, whereas you probably want 8-bit ISO-8859 or
> something.
Also it seems that hPutPS insists on constructing a String as a half-way stage,
which doesn't seem very efficient.  In my particular application I don't
much care if writing very short strings is inefficient, but I do very much
care that writing long strings should be efficient.

> (2) Passing to FFI functions: to make this work you can use
> pinned byte-arrays instead of ordinary byte-arrays to store the string,
> and an explicit touch# after the FFI call.

I am grateful for Alastair Reid's solution, but it seems too complicated.
In particular, I really don't want to have to write C code to take the
structure apart and reassemble it again, and I don't think I need the
reference counts.  So instead what I've done is implement

    data ICStringLen = ICStringLen (ForeignPtr CChar) Int

and functions
    mkICStringLen :: Int -> (CString -> IO()) -> IO ICStringLen
    withICStringLen :: ICStringLen -> (Int -> CString -> IO a) -> IO a
which can be implemented easily enough, and are pretty much all that
is required for my limited application.

All the same I think there is a case for having immutable CStrings, and
similar things, more widely available.  For example, it's annoying
having to remember to manually free things (and indeed work out what
variety of "free" to use), and it would not surprise me if this turns
out to be a major source of bugs in the future.  It seems to me
that immutable CStrings ought also to be a useful way of storing large
quantities of (ASCII or UTF8-encoded) character data.