[Haskell-cafe] Confused about ByteString, UTF8, Data.Text and sockets, still.

Daniel Fischer daniel.is.fischer at web.de
Fri Sep 3 09:17:47 EDT 2010


On Friday 03 September 2010 14:04:26, JP Moresmau wrote:
> Hello all
>
> After reading the modules docs and some other discussions, I'm still not
> sure what's the best choice of tools for my problem. I'm looking at the
> scion server code base. At the moment, it's reading and writing on
> sockets using Lazy ByteStrings, then converting them to Haskell Strings
> using utf8-string. The Haskell Strings are then parsed as JSON using the
> JSon package. the response is in JSON, translated back with utf8-string
> to ByteStrings.
> This is efficient for small strings, but as I'm extending the API I have
> calls with much more data, and performance degrades significantly.
> Timings seem to point to the encoding of the String to UTF8.
> I have replaced JSon by AttoJson (there was also JSONb, which seems
> quite similar), which allows me to work solely with ByteStrings,
> bypassing the calls to utf8-string completely. Performance has improved
> noticeably. I'm worried that I've lost full UTF8 compatibility, though,
> haven't I? No double byte characters will work in that setup?

That depends. I'm not familiar with JSON, but iirc, all delimiters are 
ASCII characters, so it could just work.

> Is Data.Text an alternative? Can I use that everywhere, including for
> dealing with sockets (the API only mentions Handle). Should I use
> Data.ByteString.UTF8 everywhere, rewriting the JSON parser to deal with
> this instead of the Word8 ByteStrings?

Data.ByteString.UTF8 uses the ordinary Word8 ByteStrings, it just offers 
some functions to deal with UTF8 encoding.

> In short, what's the fastest way to implement receiving/sending UTF8
> text across sockets?

The fastest way of receiving/sending UTF8 text across sockets is, I 
strongly believe, ByteString. After all, UTF8 text is just a sequence of 
bytes (with special properties). It's what you do between receiving and 
sending where other methods might prove better.
If you use Data.Text, you have to de/encode between UTF8 and UTF16 on 
receiving/sending. That won't be much faster than de/encoding between UTF8 
and String, but Data.Text offers a better API for manipulating text than 
ByteString, so overall, it could be better. Depends on what your needs are, 
you'll have to try it out.

>
> Thanks for any pointer,



More information about the Haskell-Cafe mailing list