[Haskell-cafe] Clarification on uniWhite lexical definition

Mario blamario at rogers.com
Tue Oct 20 11:59:03 UTC 2020


On 2020-10-20 6:43 a.m., Immanuel Litzroth wrote:
> The haskell report says:
> uniWhite → any Unicode character defined as whitespace
>
> it's not clear to me whether this means that the unicode character should
> have "Zs" as it's general category
> ;; Zs Space_Separator a space character (of various non-zero widths)
> or whether it should be defined as whitespace as in
> https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt


     Recall that this production dates from 1998, which was the early 
days of Unicode. You should be looking approximately at the Unicode 
2.1.8 standard, not the latest one. And once you look there, you'll find 
it was much simpler:


> Property dump for: 0x10000004 (White space)
>
> 0009..000D  (5 chars)
> 0020
> 00A0
> 2000..200B  (12 chars)
> 2028..2029  (2 chars)
> 3000


         So there was no ambiguity at the time. Now if you're trying to 
extrapolate the intent to the present standard... well I have no more 
authority than you in the matter, but I'd go with the more inclusive 
definition.




More information about the Haskell-Cafe mailing list