duncan.coutts at worc.ox.ac.uk
Tue Apr 25 10:25:35 EDT 2006
On Tue, 2006-04-25 at 22:34 +1000, Donald Bruce Stewart wrote:
> > On Tue, Apr 25, 2006 at 12:08:45PM +0300, Einar Karttunen wrote:
> > > The name Latin1 is particularly bad since there are many other
> > > single byte encodings around.
> > The name is quite appropriate, since that is the particular encoding of
> > Char that is exposed by the interface. What's bad is that there's no
> > choice. Calling it Latin1 is just being honest about that, and leaving
> > room for modules with other encodings or an interface parameterized
> > by encoding.
> Ok. Duncan, Ketil, Ross and Simon make good points here.
> I'll move Data.ByteString.Char -> Data.ByteString.Latin1
If you want to justify that and provide some concrete spec you can add
something like the following to the Data.ByteString.Latin1 docs:
Manipulate ByteStrings using Char operations. All Chars will be
truncated to 8 bits.
More specifically these byte strings are taken to be in the
subset of Unicode covered by code points 0-255. This covers
Unicode Basic Latin, Latin-1 Supplement and C0+C1 Controls.
One reason to be so specific is that other definitions of character sets
commonly called "Latin-1" omit the control characters and so do not
cover all bytes 0-255.
I think this allows us to justify reinterpreting Word8s as Chars and
getting valid Unicode code points.
More information about the Libraries