[Haskell-cafe] Clarification on uniWhite lexical definition
Viktor Dukhovni
ietf-dane at dukhovni.org
Tue Oct 20 20:08:59 UTC 2020
On Tue, Oct 20, 2020 at 12:43:06PM +0200, Immanuel Litzroth wrote:
> The haskell report says:
> uniWhite → any Unicode character defined as whitespace
>
> it's not clear to me whether this means that the unicode character should
> have "Zs" as it's general category
> ;; Zs Space_Separator a space character (of various non-zero widths)
> or whether it should be defined as whitespace as in
> https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
>
> Any clarification appreciated,
FWIW, GHC uses "Zs":
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L124-128
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2387-2452
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2428
https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2451
with the definition of generalCategory "Space" at:
https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/Unicode.hs#L133
--
Viktor.
More information about the Haskell-Cafe
mailing list