[Haskell-cafe] Copying Arrays

Fri May 30 04:02:10 EDT 2008

On Fri, May 30, 2008 at 9:00 AM, Ketil Malde <ketil at malde.org> wrote:
> Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:
>> The reason we do not want to re-use ByteString as the underlying
>> representation is because they're not good for short strings and we
>> expect that for Unicode text (more than arbitrary blobs of binary data)
>> people will want efficient short strings.
>
> I guess this is where I don't follow: why would you need more short
> strings for Unicode text than for ASCII or 8-bit latin text?

But ByteStrings are neither ASCII nor 8-bit Latin text! The latter
might be internally represented using an 8-bit encoding but saying
that they are the same would be to confuse representation with
intended use. The use case for ByteString is representing a sequence
of bytes used in e.g. fast binary socket and file I/O. The intent of
the not-yet-existing Unicode string is to represent text not bytes.
These are different use cases so having to make different trade-offs
shouldn't come as a surprise. To give just one example, short
(Unicode) strings are common as keys in associative data structures
like maps where fast allocation, small memory footprint and fast
comparison is important. That being said it's entirely possible that I
didn't get your point.

Can I also here insert a plea for keeping lazy I/O out of the new
Unicode module?

-- Johan