Let's get this finished

Tue Jan 9 08:41:23 EST 2001

"Marcin 'Qrczak' Kowalczyk" <mk167280 at zodiac.mimuw.edu.pl> wrote,

> On Tue, 9 Jan 2001, Manuel M. T. Chakravarty wrote:
> 
> > > We already have mallocArray0 and pokeArray0. You only have to cast
> > > characters to [CChar].
> > 
> > Sure - but why not have this as predefined functions in
> > CString?  That's all I am proposing.
> 
> I would not encourage people to skip the conversion and produce code which
> works only for ISO-8859-1. Latin1 is just one of many encodings.
> 
> Since string handling in Haskell is already inefficient, I hope that 
> adding conversions would not make a big relative difference. It would
> be a different story if strings could be passed to C functions without
> marshalling.

Ok, I have thought about it again.  malloc and poke aren't
much used on strings anyway.  So, the interface

    type CString    = Ptr CChar
    type CStringLen = (CString, Int)

    peekCString      :: CString    -> IO String
    peekCStringLen   :: CStringLen -> IO String

    withCString      :: String -> (CString    -> IO a) -> IO a
    withCStringLen   :: String -> (CStringLen -> IO a) -> IO a

    newCString       :: String -> IO CString
    newCStringLen    :: String -> IO CStringLen

which, I think, is what you want, should suffice for the
moment.  There is, however, one constraint that I would like
to impose on the design of the conversion library.  It must
be possible to spot and optimise the application of cheap
conversions like to/fromLatin1.

In other words, the design of the system must be such that
if I use, eg, toLatin1, it can use alloca instead of malloc,
because it is clear that the length of the String won't
change in a Latin1 conversion.  In GHC, for example, I might
have a rule as follows:

  {-# RULES
    "newCString/Latin1" forall s.
      newCStringConv toLatin1 s = newCStringLatin1 s
   #-}

where the function `newCStringLatin1' makes use of `alloca'
to speed up memory allocation.  If I understood you
correctly, we would have

  toLatin1 :: Conv Int16 CChar

or so.  But it would require fromLatin1 to be pre-defined as
a symbol somewhere.  Maybe a yucky

  toLatin1 :: Conv Int16 CChar
  toLatin1  = unsafePerformIO $ iconv "GHC dft encoding" "Latin1"

In any case, this requires that the types and rest of the
design is such that I can use a rewrite rule to spot such
cases.  Is that ok with you?

Cheers,
Manuel