Data.ByteString candidate 3
Einar Karttunen
ekarttun at cs.helsinki.fi
Tue Apr 25 19:16:38 EDT 2006
On 25.04 13:46, John Meacham wrote:
> I think all we really need are
>
> Data.ByteString
> Data.PackedString
>
> (Though, I suppose Latin1 could be useful)
Using the Word8 API is not very pleasant, because all
character constants etc are not Word8.
As for Latin1 - what semantics do we use for toUpper/toLower and Ord?
Using the unicode ones or locale seems the sensible thing if the data
really is Latin1.
Thus a simple wrapper to the Word8 api is desirable. Make it follow
few simple rules:
* c2w . w2c = id (conversion is a bijection)
* ascii characters translated correctly
* toLower/toUpper for ascii
* Ord by byte values.
This is very useful for many purposes and does not mean that there
should not be a fancy UTF8 module. Rather than arguing about killing
this, wouldn't it be more productive to create the UTF8 module?
> but note, do the people that want latin1 just need ASCII? because it should be
> noted that if we have a UTF8 PackedString, then we can make
> ASCII-specific access routines that are just as fast as the ones in the
> Latin1 variety without giving up the ability to store full unicode
> values in the string.
Case conversions and ordering need to be different. Thus we need to newtype
things to avoid having two conflicting Ord instances. The UTF8 layer
should provide:
* Unicode toUpper/toLower
* Unicode collation (UCA) for Ord
* Graphemes (see Perl6 for good ways to do this)
* Normalisation
- Einar Karttunen
More information about the Libraries
mailing list