[Haskell-cafe] bytestring vs. uvector

Sun Mar 8 03:33:02 EDT 2009

bos:
> On Sat, Mar 7, 2009 at 10:23 PM, Alexander Dunlap <alexander.dunlap at gmail.com>
> wrote:
> 
>     Hi all,
> 
>     For a while now, we have had Data.ByteString[.Lazy][.Char8] for our
>     fast strings. Now we also have Data.Text, which does the same for
>     Unicode. These seem to be the standard for dealing with lists of bytes
>     and characters.
> 
>     Now we also have the storablevector, uvector, and vector packages.
>     These seem to be also useful for unpacked data, *including* Char and
>     Word8 values.
> 
>     What is the difference between bytestring and these new "fast array"
>     libraries? Are the latter just generalizations of the former?
> 
> 
> There are quite a few overlaps and differences among them.
> 
> bytestring is mature and useful for low-level byte buffer manipulations, and
> also for efficient I/O. This is in part because it uses pinned pointers that
> can interoperate easily with foreign code. It used to have an early fusion
> rewriting framework, but that was abandoned. So it will not fuse multiple
> ByteString traversals into single loops. This library is widely used, and also
> somewhat abused for text I/O.
> 
> storablevector is not mature (I'm not even sure if it's actually used) and is a
> derivative of an old version of the bytestring library, and so has similar
> characteristics for interacting with foreign code. It contains some old fusion
> code that is sketchy in nature and somewhat likely to be broken. I'm not sure I
> would recommend using this library.
> 
> uvector is, if my memory serves me correctly, a fork of the vector library. It
> uses modern stream fusion, but is under active development and is a little
> scary. I'm a little unclear on the exact difference between uvector and vector.
> Both use arrays that are not pinned, so they can't be readily used with foreign
> code. If you want to use either library, understand that you're embarking on a
> bracing adventure.
> 
> text is not mature, and is based on the same modern fusion framework as uvector
> and vector. It uses unpinned arrays, but provides functions for dealing with
> foreign code. It uses a denser encoding than uvector for text, and provides
> text-oriented functions like splitting on word and line boundaries. Although
> it's intended for use with Unicode text, it does not yet provide proper
> Unicode-aware functions for things like case conversion. It interacts with
> bytestring to perform conversion to and from standard representations like
> UTF-8, and (via the text-icu package) ICU for others (SJIS, KOI-8, etc). If you
> want to use this library, understand that you're embarking on a bracing
> adventure.

I endorse this message.