Let's get this finished

Sun Jan 7 03:46:06 EST 2001

Sun, 07 Jan 2001 13:15:21 +1100, Manuel M. T. Chakravarty <chak at cse.unsw.edu.au> pisze:

> > When someone really wants to use mallocCString and pokeCString now
> > (knowing that there is a little point of doing that in the case of
> > conversions), he can use mallocArray0 and pokeArray0, after casting
> > characters of the string to [CChar].
> 
> To be honest, I don't like this.  It is nice having the interface
> such that we can switch to using conversions at some point, but
> I still want to be able to conveniently deal with 8bit characters
> (because this is what many C libraries use).  So, I want a fast and
> convenient interface to 8bit strings *in addition* to the interface
> that can deal with conversions.  In particular this means that
> I don't want to deal with CChar in the Haskell interface only to
> circumvent conversion.

I understand everything except the last sentence. Why it is bad to
deal with CChar in Haskell?

It could be confusing if some String values represented texts in
Unicode and others - in the C's encoding. (Especially if the programmer
uses ISO-8859-1 for C encoding and does not care about the difference,
and then somebody using ISO-8859-7 tries to run his code!)

IMHO most strings on which C functions work (those ending with
'\0') are either in the default local encoding (if they are texts
in a natural language or filenames) or more rarely ASCII (if they
are e.g. names of mail headers, identifiers in a C program, or
commandline switches of some program). Sometimes the encoding is
specified explicitly by the protocol or is stored in data itself.

For ASCII the default local encoding can be used too, with a speed
penalty; practically used encodings are ASCII-compatible. You can
explicitly specify fromLatin1 or toLatin1 if you really want C
characters to map to Haskell's '\0'..'\255' - it should be faster
(does not call iconv or the like). You can also use CChar.

So most of the time strings should be converted to native Haskell's
encoding of Unicode, to be compatible with other parts of libraries
which expect the text to be in Unicode, and to let "words" and
"toUpper" and "length" work correctly (if by "length" you understand
the number of characters, not bytes in the physical encoding). It's
hard to do conversions in one place and not do it in others if you
mix data between those places.

What cases do you have in mind when strings should be passed to C
libraries unconverted?

> How about `advancePtr'?  But I am wondering whether this
> shouldn't go into MarshalArray?  It is used for array
> access, isn't it?

It's indeed only for arrays (where plusPtr is for structs, where the
offset is in bytes).

> `mapCont' makes a lot of sense.  I am less sure about
> `sequenceCont'.

mapCont is indeed more common (I used it only on withCString yet).
Uses of sequenceCont:

    sequenceCont (replicate n $ allocaArray m) $ \rows -> ...

    sequenceCont [
        if s == "" then ($ nullPtr) else withCString s
        | s <- listOfStrings] $
        \stringPtrs -> ...

> How about calling `mapCont` simply `withMany'.

I like it. sequenceCont could be allocaMany.

-- 
 __("<  Marcin Kowalczyk * qrczak at knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTÊPCZA
QRCZAK