[Haskell-cafe] Clarification on uniWhite lexical definition

Viktor Dukhovni ietf-dane at dukhovni.org
Tue Oct 20 20:08:59 UTC 2020


On Tue, Oct 20, 2020 at 12:43:06PM +0200, Immanuel Litzroth wrote:

> The haskell report says:
> uniWhite → any Unicode character defined as whitespace
> 
> it's not clear to me whether this means that the unicode character should
> have "Zs" as it's general category
> ;; Zs Space_Separator a space character (of various non-zero widths)
> or whether it should be defined as whitespace as in
> https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
> 
> Any clarification appreciated,

FWIW, GHC uses "Zs":

    https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L124-128
        https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2387-2452
            https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2428
            https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L2451

with the definition of generalCategory "Space" at:

    https://gitlab.haskell.org/ghc/ghc/-/blob/master/libraries/base/GHC/Unicode.hs#L133

-- 
    Viktor.


More information about the Haskell-Cafe mailing list