[Haskell-cafe] Has character changed in GHC 6.8?
Duncan Coutts
duncan.coutts at worc.ox.ac.uk
Tue Jan 22 05:36:44 EST 2008
On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
> I vaguely remember that in GHC 6.6 code like this
>
> length $ map ord "a string"
>
> being able able to generate a different answer than
>
> length "a string"
That seems unlikely.
> At the time I thought that the encoding (in my case UTF-8) was “leaking
> through”. After switching to GHC 6.8 the behaviour seems to have
> changed, and mapping 'ord' on a string results in a list of ints
> representing the Unicode code point rather than the encoding:
Yes. GHC 6.8 treats .hs files as UTF-8 where it previously treated them
as Latin-1.
> > map ord "åäö"
> [229,228,246]
>
> Is this the case, or is there something strange going on with character
> encodings?
That's what we'd expect. Note that GHCi still uses Latin-1. This will
change in GHC-6.10.
> I was hoping that this would mean that 'chr . ord' would basically be a
> no-op, but no such luck:
>
> > chr . ord $ 'å'
> '\229'
>
> What would I have to do to get an 'å' from '229'?
Easy!
Prelude> 'å' == '\229'
True
Prelude> 'å' == Char.chr 229
True
Remember, when you type:
Prelude> 'å'
what you really get is:
Prelude> putStrLn (show 'å')
So perhaps what is confusing you is the Show instance for Char which
converts Char -> String into a portable ascii representation.
Duncan
More information about the Haskell-Cafe
mailing list