Marshalling Haskell String <-> UTF-8

Bayley, Alistair Alistair_Bayley at ldn.invesco.com
Wed Sep 1 09:51:47 EDT 2004


> From: George Russell [mailto:ger at informatik.uni-bremen.de]
> 
> http://www.haskell.org//pipermail/glasgow-haskell-users/2004-April/006
> 564.html


Thanks George, this looks useful.

There are some things I want to clarify...

module UTF8(
    toUTF8,
       -- :: String -> String
       -- Converts a String (whose characters must all have codes <2^31)
into
       -- its UTF8 representation.
    fromUTF8WE,
       -- :: Monad m => String -> m String
       -- Converts a UTF8 representation of a String back into the String,
       -- catching all possible format errors.

Does toUTF8 return a String whose Chars are all code-points < 256, which,
when converted to bytes, will represent a UTF-8 string?

Likewise, does fromUTF8WE expect a String whose Chars are all code-points <
256 i.e. they are the result of saying "chr n" for each byte in the UTF-8
stream?



> From: Simon Marlow [mailto:simonmar at microsoft.com]
> 
> In any case, none of this allows you to specify a UTF-8 conversion.

Are there plans to add UTF-8 (and UTF-16) conversion functions to the
libraries? I imagine they would be useful...


> Your best bet is to marshal it yourself.  We're a bit behind in this
> area: 6.2.x doesn't have CAString and CWString, and CString is just 
> char*.  The HEAD has CAString and CWString, and will hopefully follow 
> the FFI spec by the time we release 6.4 (we still have to do the 
> locale encoding/decoding between CString and String, IIRC).

Again, I want to clarify some things...

 - will any encoding/decoding be performed by the
peekCString/withCString/newCString(Len) functions? i.e. if I want to avoid
encoding/decoding (because I know my string is already in UTF-8) then I
simply have to avoid using these functions?

 - can I still declare the foreign functions with CString types without
worrying that encoding/decoding might be attempted?

 - when the CAString functions are available then I should use them, but for
now I will have to write something that uses castCharToCChar/castCCharToChar
+ peekArray0/pokeArray0.


Thanks,
Alistair.

-----------------------------------------
*****************************************************************
Confidentiality Note: The information contained in this 
message, and any attachments, may contain confidential 
and/or privileged material. It is intended solely for the 
person(s) or entity to which it is addressed. Any review, 
retransmission, dissemination, or taking of any action in 
reliance upon this information by persons or entities other 
than the intended recipient(s) is prohibited. If you received
this in error, please contact the sender and delete the 
material from any computer.
*****************************************************************



More information about the Glasgow-haskell-users mailing list