[Haskell-cafe] Re: Valid Haskell characters

Deborah Goldsmith dgoldsmith at mac.com
Mon Aug 25 23:30:35 EDT 2008


No, the general category is not enough. Please read both references.  
As you can tell from DerivedCoreProperties.txt, for example:

# Derived Property: Uppercase
#  Generated from: Lu + Other_Uppercase

So general category Lu is not the same thing as "Uppercase"

Deborah

On Aug 25, 2008, at 7:18 PM, Maurí cio wrote:

> On chapter 4 I see the following
> nice table in page 139. Do you think
> I can use it together with UnicodeData.txt
> to choose valid characters for Haskell?
> Here is the only place I found where names
> match with haskell syntax reference
> (uppercase, lowercase, punctuation, symbol).
>
> Thanks,
> Maurício
>
>                       Table 4-7. General Category
>
> Lu = Letter, uppercase
> Ll = Letter, lowercase
> Lt = Letter, titlecase
> Lm = Letter, modifier
> Lo = Letter, other
> Mn = Mark, nonspacing
> Mc = Mark, spacing combining
> Me = Mark, enclosing
> Nd = Number, decimal digit
> Nl = Number, letter
> No = Number, other
> Pc = Punctuation, connector
> Pd = Punctuation, dash
> Ps = Punctuation, open
> Pe = Punctuation, close
> Pi = Punctuation, initial quote (may behave like Ps or Pe depending  
> on usage)
> Pf = Punctuation, final quote (may behave like Ps or Pe depending on  
> usage)
> Po = Punctuation, other
> Sm = Symbol, math
> Sc = Symbol, currency
> Sk = Symbol, modifier
> So = Symbol, other
> Zs = Separator, space
> Zl = Separator, line
> Zp = Separator, paragraph
> Cc = Other, control
> Cf = Other, format
> Cs = Other, surrogate
> Co = Other, private use
> Cn = Other, not assigned (including noncharacters)
>
>
>
>
> Deborah Goldsmith a écrit :
>> You can't determine Unicode character properties by analyzing the  
>> names of the characters.
>> Read chapter 4 of the standard:
>> http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf
>> and get the property values here:
>> http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
>> It sounds like the properties you want are "Case" and "General  
>> Category". Maybe the spec should be more explicit on exactly how  
>> the definitions map onto Unicode properties, so there is no  
>> ambiguity.
>> Deborah
>> On Aug 25, 2008, at 6:15 PM, Maurí cio wrote:
>>> Hi,
>>>
>>> In Haskell reference, I see the
>>> following definitions:
>>>
>>> uniWhite -> any Unicode character defined
>>> as whitespace;
>>>
>>> uniSmall -> any Unicode lowercase letter;
>>>
>>> uniLarge -> any uppercase or titlecase
>>> Unicode letter;
>>>
>>> uniSymbol -> any Unicode symbol or
>>> punctuation.
>>>
>>> Where do I get lists for those
>>> characters? My first attempt was to
>>> check:
>>>
>>> http://unicode.org/Public/UNIDATA/UnicodeData.txt
>>>
>>> and consider large anything marked as
>>> CAPITAL and small anything marked as SMALL. I
>>> didn't know what to guess about the symbols.
>>> Am I using the right reference? How can I
>>> recognize (or get a list of) valid uppercase and
>>> lowercase unicode letters, as well as symbols
>>> and punctuation?
>>>
>>> Thanks for your help,
>>> Maurício
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list