Let's get this finished
Manuel M. T. Chakravarty
chak at cse.unsw.edu.au
Tue Jan 9 08:41:23 EST 2001
"Marcin 'Qrczak' Kowalczyk" <mk167280 at zodiac.mimuw.edu.pl> wrote,
> On Tue, 9 Jan 2001, Manuel M. T. Chakravarty wrote:
>
> > > We already have mallocArray0 and pokeArray0. You only have to cast
> > > characters to [CChar].
> >
> > Sure - but why not have this as predefined functions in
> > CString? That's all I am proposing.
>
> I would not encourage people to skip the conversion and produce code which
> works only for ISO-8859-1. Latin1 is just one of many encodings.
>
> Since string handling in Haskell is already inefficient, I hope that
> adding conversions would not make a big relative difference. It would
> be a different story if strings could be passed to C functions without
> marshalling.
Ok, I have thought about it again. malloc and poke aren't
much used on strings anyway. So, the interface
type CString = Ptr CChar
type CStringLen = (CString, Int)
peekCString :: CString -> IO String
peekCStringLen :: CStringLen -> IO String
withCString :: String -> (CString -> IO a) -> IO a
withCStringLen :: String -> (CStringLen -> IO a) -> IO a
newCString :: String -> IO CString
newCStringLen :: String -> IO CStringLen
which, I think, is what you want, should suffice for the
moment. There is, however, one constraint that I would like
to impose on the design of the conversion library. It must
be possible to spot and optimise the application of cheap
conversions like to/fromLatin1.
In other words, the design of the system must be such that
if I use, eg, toLatin1, it can use alloca instead of malloc,
because it is clear that the length of the String won't
change in a Latin1 conversion. In GHC, for example, I might
have a rule as follows:
{-# RULES
"newCString/Latin1" forall s.
newCStringConv toLatin1 s = newCStringLatin1 s
#-}
where the function `newCStringLatin1' makes use of `alloca'
to speed up memory allocation. If I understood you
correctly, we would have
toLatin1 :: Conv Int16 CChar
or so. But it would require fromLatin1 to be pre-defined as
a symbol somewhere. Maybe a yucky
toLatin1 :: Conv Int16 CChar
toLatin1 = unsafePerformIO $ iconv "GHC dft encoding" "Latin1"
In any case, this requires that the types and rest of the
design is such that I can use a rewrite rule to spot such
cases. Is that ok with you?
Cheers,
Manuel
More information about the FFI
mailing list