FPS/Data.ByteString candidate

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue Apr 25 10:25:35 EDT 2006


On Tue, 2006-04-25 at 22:34 +1000, Donald Bruce Stewart wrote:
> ross:
> > On Tue, Apr 25, 2006 at 12:08:45PM +0300, Einar Karttunen wrote:
> > > The name Latin1 is particularly bad since there are many other
> > > single byte encodings around.
> > 
> > The name is quite appropriate, since that is the particular encoding of
> > Char that is exposed by the interface.  What's bad is that there's no
> > choice.  Calling it Latin1 is just being honest about that, and leaving
> > room for modules with other encodings or an interface parameterized
> > by encoding.
> 
> Ok. Duncan, Ketil, Ross and Simon make good points here.
> I'll move Data.ByteString.Char -> Data.ByteString.Latin1

If you want to justify that and provide some concrete spec you can add
something like the following to the Data.ByteString.Latin1 docs:

        Manipulate ByteStrings using Char operations. All Chars will be
        truncated to 8 bits.
        
        More specifically these byte strings are taken to be in the
        subset of Unicode covered by code points 0-255. This covers
        Unicode Basic Latin, Latin-1 Supplement and C0+C1 Controls.
        
        See: http://www.unicode.org/charts/
        http://www.unicode.org/charts/PDF/U0000.pdf
        http://www.unicode.org/charts/PDF/U0080.pdf


One reason to be so specific is that other definitions of character sets
commonly called "Latin-1" omit the control characters and so do not
cover all bytes 0-255.

I think this allows us to justify reinterpreting Word8s as Chars and
getting valid Unicode code points.

Duncan



More information about the Libraries mailing list