[Haskell-cafe] bytestring vs. uvector

Austin Seipp mad.one at gmail.com
Sun Mar 8 03:20:09 EDT 2009


Excerpts from Alexander Dunlap's message of Sun Mar 08 00:23:01 -0600 2009:
> For a while now, we have had Data.ByteString[.Lazy][.Char8] for our
> fast strings. Now we also have Data.Text, which does the same for
> Unicode. These seem to be the standard for dealing with lists of bytes
> and characters.
> 
> Now we also have the storablevector, uvector, and vector packages.
> These seem to be also useful for unpacked data, *including* Char and
> Word8 values.
> 
> What is the difference between bytestring and these new "fast array"
> libraries? Are the latter just generalizations of the former?
> 
> Thanks for any insight anyone can give on this.
> 
> Alex


Data.Text provides functions for unicode over bytestrings, with several
encoding/decoding methods. So, I think that bytestring+text now solves
the general problem with the slow String type - we get various
international encodings, and fast, efficient packed strings.

(It's also worth mentioning utf8-string, which gives you utf8 over
bytestrings. text gives you more encodings and is probably still quite
efficient, however.)

But this is pretty much a separate effort to that of packages like
uvector/vector etc. etc.. To clarify, uvector and vector are likely to
be merged in the future I think - vector is based on the idea of
'recycling arrays' so that array operations are still very efficient,
while uvector only has the tested stream fusion technique behind it.

Actually, I think the inevitable plan is to merge the technology
behind both vector and uvector into the Data Parallel Haskell
project. Array recylcing and stream fusion goes into creating
extremely efficient sequential code, while the vectorisation pass
turns that into efficient multicore code at the same time.

In any case, I suppose that hypothetically if someone wanted to use a
package like uvector to create an efficient string type, they could,
but if they want that, why not just use bytestring? It's already
optimized, battle tested and in extremely wide use.

I think some library proliferation is good; in this case, the
libraries mentioned here are really for some different purposes, and
that's great, because they all lead to some nice, fast code with low
conceptual overhead when put together (hopefully...) But I'm not even
going to begin examining/comparing the different array interfaces or
anything, because that's been done many times here, so you best check
the archives if you want the 'in-depth' on the matter.

Austin


More information about the Haskell-Cafe mailing list