[Haskell-cafe] UTF-8 in Haskell.

Max Bolingbroke batterseapower at hotmail.com
Thu Dec 23 12:18:47 CET 2010


On 23 December 2010 05:29, Magicloud Magiclouds
<magicloud.magiclouds at gmail.com> wrote:
>  If so, OK, then I think I could make a packInt which turns an Int
> into 4 Word8 first. Thus under all situation (ascii, UTF-8, or even
> UTF-32), my program always send 4 bytes through the network. Is that
> OK?

I think you are describing the UTF-32 encoding (under the assumption
that fromEnum on Char returns the Unicode code point of that
character, which I think is true). UTF-32 is capable of describing
every Unicode code point so this is indeed non-lossy. UTF-32 is a
reasonable wire transfer format (if a bit inefficient!).

Don't roll your own encoding logic though, System.IO provides a
TextEncoding for UTF-32 you can use to do the job more reliably.

Cheers,
Max



More information about the Haskell-Cafe mailing list