[Haskell-cafe] Copying Arrays

Fri May 30 03:00:10 EDT 2008

Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:

>>> Because I'm writing the Unicode-friendly ByteString =p

> He's designing a proper Unicode type along the lines of ByteString.

So - storing 22.5 bit code points instead of 8-bit quantities?  Or
storing whatever representation from the input, and providing a nice
interface on top?

>> Perhaps I'm not understanding.  Why wouldn't you use ByteString for I/O,

Like everybody else, my first reaction is to put a layer (like Char8)
on top of lazy bytestrings.  For variable-length encodings, you lose
direct indexing, but I think this is not very common, and if you need
it, you should convert to a fixed length encoding instead.  Since a BS
is basically a (pointer to array,offset,length) triple, it should be
relatively easy to ensure that you don't break a wide char between
chunks by adjusting the length (which doesn't have to match the
actual array length).

> The reason we do not want to re-use ByteString as the underlying
> representation is because they're not good for short strings and we
> expect that for Unicode text (more than arbitrary blobs of binary data)
> people will want efficient short strings.

I guess this is where I don't follow: why would you need more short
strings for Unicode text than for ASCII or 8-bit latin text?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants