[Haskell-cafe] Confused about ByteString, UTF8, Data.Text and sockets, still.

John Millikin jmillikin at gmail.com
Fri Sep 3 11:29:53 EDT 2010

On Fri, Sep 3, 2010 at 05:04, JP Moresmau <jpmoresmau at gmail.com> wrote:
> I have replaced JSon by AttoJson (there was also JSONb, which seems quite
> similar), which allows me to work solely with ByteStrings, bypassing the
> calls to utf8-string completely. Performance has improved noticeably. I'm
> worried that I've lost full UTF8 compatibility, though, haven't I? No double
> byte characters will work in that setup?

It should be easy enough to test; generate a file with non-ASCII
characters in it and see if it's parsed correctly. I assume it will
be, though you won't be able to perform String operations on the
resulting decoded data unless you manually decode it. Slightly more
worrisome is that AttoJson doesn't look like it works with non-UTF8
JSON -- you might have compatibility problems unless you implement
manual decoding.

I've written a binding to YAJL (a C-based JSON parser) which might be
faster for you, if the input is very large -- though it still suffers
from the "assume UTF8" problem.


> Is Data.Text an alternative? Can I use that everywhere, including for
> dealing with sockets (the API only mentions Handle).

Use 'Network.Socket.socketToHandle' to convert sockets to handles:


