Comment Syntax
Bulat Ziganshin
bulatz at HotPOP.com
Fri Feb 3 04:53:41 EST 2006
Hello John,
Friday, February 03, 2006, 3:39:38 AM, you wrote:
>> Got a unicode-compliant compiler?
JM> sure do :)
JM> but it currently doesn't recognize any unicode characters as possible
JM> operators.
are you read this? :)
> Log:
> Add support for UTF-8 source files
>
> GHC finally has support for full Unicode in source files. Source
> files are now assumed to be UTF-8 encoded, and the full range of
> Unicode characters can be used, with classifications recognised
using
> the implementation from Data.Char. This incedentally means that
only
> the stage2 compiler will recognise Unicode in source files, because
I
> was too lazy to port the unicode classifier code into libcompat.
>
> Additionally, the following synonyms for keywords are now
recognised:
>
> forall symbol (U+2200) forall
> right arrow (U+2192) ->
> left arrow (U+2190) <-
> horizontal ellipsis (U+22EF) ..
>
> there are probably more things we could add here.
>
> This will break some source files if Latin-1 characters are being
used.
> In most cases this should result in a UTF-8 decoding error. Later
on
> if we want to support more encodings (perhaps with a pragma to
specify
> the encoding), I plan to do it by recoding into UTF-8 before
parsing.
>
> Internally, there were some pretty big changes:
>
> - FastStrings are now stored in UTF-8
>
> - Z-encoding has been moved right to the back end. Previously we
> used to Z-encode every identifier on the way in for simplicity,
> and only decode when we needed to show something to the user.
> Instead, we now keep every string in its UTF-8 encoding, and
> Z-encode right before printing it out. To avoid Z-encoding the
> same string multiple times, the Z-encoding is cached inside the
> FastString the first time it is requested.
>
> This speeds up the compiler - I've measured some definite
> improvement in parsing at least, and I expect compilations
overall
> to be faster too. It also cleans up a lot of cruft from the
> OccName interface. Z-encoding is nicely hidden inside the
> Outputable instance for Names & OccNames now.
>
> - StringBuffers are UTF-8 too, and are now represented as
> ForeignPtrs.
>
> - I've put together some test cases, not by any means exhaustive,
> but there are some interesting UTF-8 decoding error cases that
> aren't obvious. Also, take a look at unicode001.hs for a demo.
--
Best regards,
Bulat mailto:bulatz at HotPOP.com
More information about the Haskell-prime
mailing list