[Haskell-cafe] Clarification on uniWhite lexical definition
Mario
blamario at rogers.com
Tue Oct 20 11:59:03 UTC 2020
On 2020-10-20 6:43 a.m., Immanuel Litzroth wrote:
> The haskell report says:
> uniWhite → any Unicode character defined as whitespace
>
> it's not clear to me whether this means that the unicode character should
> have "Zs" as it's general category
> ;; Zs Space_Separator a space character (of various non-zero widths)
> or whether it should be defined as whitespace as in
> https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt
Recall that this production dates from 1998, which was the early
days of Unicode. You should be looking approximately at the Unicode
2.1.8 standard, not the latest one. And once you look there, you'll find
it was much simpler:
> Property dump for: 0x10000004 (White space)
>
> 0009..000D (5 chars)
> 0020
> 00A0
> 2000..200B (12 chars)
> 2028..2029 (2 chars)
> 3000
So there was no ambiguity at the time. Now if you're trying to
extrapolate the intent to the present standard... well I have no more
authority than you in the matter, but I'd go with the more inclusive
definition.
More information about the Haskell-Cafe
mailing list