[Haskell-cafe] bytestring vs. uvector

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sat Mar 14 12:02:06 EDT 2009

On Mon, 2009-03-09 at 18:29 -0700, Alexander Dunlap wrote:

> Thanks for all of the responses!
> So let me see if my summary is accurate here:
> - ByteString is for just that: strings of bytes, generally read off of
> a disk. The Char8 version just interprets the Word8s as Chars but
> doesn't do anything special with that.

Right. So it's only suitable for binary or ASCII (or mixed) formats.

> - Data.Text/text library is a higher-level library that deals with
> "text," abstracting over Unicode details and treating each element as
> a potentially-multibye "character."

If you're writing about this on the wiki for people, it's best not to
confuse the issue by talking about multibyte anything. We do not
describe [Char] as a multibyte encoding of Unicode. We say it is a
Unicode string. The abstraction is at the level of Unicode code points.
The String type *is* a sequence of Unicode code points.

This is exactly the same for Data.Text. It is a sequence of Unicode code
points. It is not an encoding. It is not UTF-anything. It does not
abstract over Unicode.

The Text type is just like the String type but with different strictness
and performance characteristics. Both are just sequences of Unicode code

There is a reasonably close correspondence between Unicode code points
and what people normally think of as characters.

> - utf8-string is a wrapper over ByteString that interprets the bytes
> in the bytestring as potentially-multibye unicode "characters."

This on the other hand is an encoding. ByteString is a sequence of bytes
and when we interpret that as UTF-8 then we are looking at an encoding
of a sequence of Unicode code points.

Clear as mud? :-)


More information about the Haskell-Cafe mailing list