[database-devel] ByteString and Text - any conventions?

Jeremy Shaw jeremy at n-heptane.com
Sun Aug 12 20:30:45 CEST 2012


Personally, I use ByteStrings for binary data, and Text for textual data.

There are a bunch of APIs which use ByteStrings when Text might be a better
choice because when they were designed the only choices were ByteString and
String.

In some cases, it could be sensible to design a low-level API around
ByteString, and build a Text API on top of that. For example, in a SQL
database, you may need communicate with the database via utf-8 encoded
binary data. So, the low-level binding would use ByteString. But, when you
are actually constructing the queries, you probably want to use Text most
of the time. The OverloadedStrings instance for ByteString only supports
ascii and just truncates utf-8 values resulting in invalid data. But, for
Text it does the right thing.

In general, relying on developers to correctly remember and encode the
ByteStrings is a poor idea. Generally, when you use Text, the developer
doesn't have to think at all, and the encodings just work. Yay for types!

The only argument for working with ByteStrings everywhere is that it might
be faster since you don't have to do the ByteString <-> Text conversion.
But I don't think there is any data to show that that conversion time is
significant. At the very least, it would be sensible to use a newtype
wrapper like, newtype UTF8 = UTF8 { toByteString :: ByteString }, to ensure
that you get at least some type checking. I am pretty sure this wrapper
exists somewhere already.

- jeremy

On Sun, Aug 12, 2012 at 1:18 PM, Janne Hellsten <jjhellst at gmail.com> wrote:

> While working on https://github.com/nurpax/sqlite-simple there have
> been occasions when I've tried to decide whether to use ByteStrings or
> Text strings.
>
> I note that postgresql-simple and mysql-simple use ByteStrings
> exclusively in the API.
>
> Has a convention formed in the Haskell community on which strings
> should be used in APIs that pass strings around?
>
> In my case, blobs will anyway be passed around as ByteStrings.  But
> SQLite's C strings are UTF8 and some might prefer to Text over
> ByteStrings.  Or support both in the API?
>
> This matters in the low-level bindings too, where result accessors
> need to return either ByteStrings or Text objects for TEXT fields.
> Currently direct-sqlite uses the String type but Irene is thinking of
> changing the type to a more efficient representation
> (https://github.com/IreneKnapp/direct-sqlite/issues/3).
>
> Sqlite-simple links against both bytestring and text already, so from
> a purely package dependency point of view the choice doesn't really
> matter.  But sqlite-direct will need to choose one or the other for
> its string type in the SQLText s constructor.
>
> Any thoughts?
>
> Cheers,
>
> Janne
>
> _______________________________________________
> database-devel mailing list
> database-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/database-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/database-devel/attachments/20120812/79388015/attachment.htm>


More information about the database-devel mailing list