[Haskell-cafe] Fun with ByteStrings [was: A very edgy language]
Andrew Coppin
andrewcoppin at btinternet.com
Sun Jul 8 07:07:53 EDT 2007
Donald Bruce Stewart wrote:
> andrewcoppin:
>
>> Does anybody have any clue why ByteStrings are actually faster? (And why
>> this information isn't easily findable anywhere - must shorly be a VFAQ.)
>>
>>
>
> It's well documented in the API documentation for bytestrings.
>
> Start here:
> http://www.cse.unsw.edu.au/~dons/fps/Data-ByteString-Lazy.html
>
I've read the API and still left wondering...
> And then read:
> http://www.cse.unsw.edu.au/~dons/papers/CSL06.html
>
Now *that* is far more useful... (And interesting.)
So what you're saying is that whereas "fusion" is usually used to mean
"that amazing technology that will one day supply us with unlimited
amounts of cheap clean energy [shame it doesn't actually work]", in the
context of Haskell it seems "fusion" means "that technology that makes
all your code 98% faster for free"? ;-)
I guess the question that's really burning in my mind is "if ByteString
is so much faster than [x], why can't you just do the same optimisations
to [x]?" In other words, "why should I need to alter my code to get all
this fusion goodness?"
Now, as I understand it, a ByteString is a kind of unboxed array (= big
RAM savings + big CPU time savings for not building it + big GC savings
for not processing millions of list nodes + better cache performance).
Or at least, a *strict* ByteString is; I'm very very fuzzy on exactly
how a *lazy* ByteString is any different to a normal list. From my
reading today, I take it the only real difference is that one is a
linked list, whereas the other is a (boxed?) array, so it's smaller. (?)
At any rate, currently all my toy compression algorithms run at
respectable speeds using [Word8], *except* for the BWT, which is
absurdly slow. I've done everything I can think of to it, and it's still
too slow. It's no use, I'm going to have to use ByteStrings to get any
kind of performance out of it. I'm just wondering whether using
getContents :: [Char] and then packing that into a ByteString is going
to completely negate all the speed advantages. (I'm really not keen to
completely mangle the entire toolbox just to make this one algorithm
hurry up.)
Also, while I'm here... I can see a sorting algorithm implemented in
Data.List, but I don't see any for other structures. For example, there
doesn't appear to be any sorting functions for any kind of array. There
doesn't appear to be anything for ByteString either. I'd like to see
such a thing in the libraries. Yes, you can sort things using (say)
Data.Map. But what if you have some data that's already *in* an array or
a ByteString and you just want to sort it? (I also notice that the
mutable arrays don't seem to provide an in-place map function. Obviously
that's only ever going to work for a function that doesn't change the
value's type, but even so...)
Finally, I don't see anything in the libraries that would efficiently
sort (large) strings. Data.List.sort and Data.Map.Map both use an Ord
context, and we have instance (Ord x) => Ord [x], but one would think
that a [potentially large] list of values could be sorted more
efficiently using a radix sort than a quicksort...? An implementation of
Data.Map especially for the (common?) case of the keys being a *list* of
Ord items would seem useful... (And in my current program, I'm probably
going to implement a radix sort on lists of ByteStrings.)
More information about the Haskell-Cafe
mailing list