[Haskell-cafe] Confused about ByteString, UTF8, Data.Text and sockets, still.

JP Moresmau jpmoresmau at gmail.com
Fri Sep 3 08:04:26 EDT 2010

Hello all

After reading the modules docs and some other discussions, I'm still not
sure what's the best choice of tools for my problem. I'm looking at the
scion server code base. At the moment, it's reading and writing on sockets
using Lazy ByteStrings, then converting them to Haskell Strings
using utf8-string. The Haskell Strings are then parsed as JSON using the
JSon package. the response is in JSON, translated back with utf8-string to
This is efficient for small strings, but as I'm extending the API I have
calls with much more data, and performance degrades significantly. Timings
seem to point to the encoding of the String to UTF8.
I have replaced JSon by AttoJson (there was also JSONb, which seems quite
similar), which allows me to work solely with ByteStrings, bypassing the
calls to utf8-string completely. Performance has improved noticeably. I'm
worried that I've lost full UTF8 compatibility, though, haven't I? No double
byte characters will work in that setup?
Is Data.Text an alternative? Can I use that everywhere, including for
dealing with sockets (the API only mentions Handle). Should I use
Data.ByteString.UTF8 everywhere, rewriting the JSON parser to deal with this
instead of the Word8 ByteStrings?
In short, what's the fastest way to implement receiving/sending UTF8 text
across sockets?

Thanks for any pointer,
JP Moresmau
