[Haskell-cafe] Copying Arrays
ketil at malde.org
Fri May 30 03:00:10 EDT 2008
Duncan Coutts <duncan.coutts at worc.ox.ac.uk> writes:
>>> Because I'm writing the Unicode-friendly ByteString =p
> He's designing a proper Unicode type along the lines of ByteString.
So - storing 22.5 bit code points instead of 8-bit quantities? Or
storing whatever representation from the input, and providing a nice
interface on top?
>> Perhaps I'm not understanding. Why wouldn't you use ByteString for I/O,
Like everybody else, my first reaction is to put a layer (like Char8)
on top of lazy bytestrings. For variable-length encodings, you lose
direct indexing, but I think this is not very common, and if you need
it, you should convert to a fixed length encoding instead. Since a BS
is basically a (pointer to array,offset,length) triple, it should be
relatively easy to ensure that you don't break a wide char between
chunks by adjusting the length (which doesn't have to match the
actual array length).
> The reason we do not want to re-use ByteString as the underlying
> representation is because they're not good for short strings and we
> expect that for Unicode text (more than arbitrary blobs of binary data)
> people will want efficient short strings.
I guess this is where I don't follow: why would you need more short
strings for Unicode text than for ASCII or 8-bit latin text?
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe