[Haskell-cafe] Bytestrings and [Char]

John Millikin jmillikin at gmail.com
Tue Mar 23 11:51:16 EDT 2010


On Tue, Mar 23, 2010 at 00:27, Johann Höchtl <johann.hoechtl at gmail.com> wrote:
> How are ByteStrings (Lazy, UTF8) and Data.Text meant to co-exist? When I
> read bytestrings over a socket which happens to be UTF16-LE encoded and
> identify a fitting function in Data.Text, I guess I have to transcode them
> with Data.Text.Encoding to make the type System happy?
>
There's no such thing as a UTF8 or UTF16 bytestring -- a bytestring is
just a more efficient encoding of [Word8], just as Text is a more
efficient encoding of [Char]. If the file format you're parsing
specifies that some series of bytes is text encoded as UTF16-LE, then
you can use the Text decoders to convert to Text.

Poor separation between bytes and characters has caused problems in
many major languages (C, C++, PHP, Ruby, Python) -- lets not abandon
the advantages of correctness to chase a few percentage points of
performance.


More information about the Haskell-Cafe mailing list