Improving Data.Char.isSpace performance

wren ng thornton wren at freegeek.org
Thu Nov 8 19:18:57 CET 2012


On 10/31/12 11:49 PM, Patrick Palka wrote:
> On Wed, Oct 31, 2012 at 10:39 PM, wren ng thornton <wren at freegeek.org>wrote:
>
>> The one thing I worry about using \x1680 as the threshold[1] is that I'm
>> not sure whether every character below \x1680 has been allocated or whether
>> some are still free. If any of them are free, then this will become
>> incorrect in subsequent versions of Unicode so it's a maintenance timebomb.
>> (Whereas if they're all specified then it should be fine.) Can someone
>> verify that using \x1680 is sound in this manner?
>>
>
> According to GHCi:
>
> Prelude Data.Char> length $ filter ((== NotAssigned) . generalCategory)
>> ['\0'..'\x1680']
>> 830

Guess I never looked closely at what Unicode queries Data.Char offers... 
Looks like the first unassigned character is '\888'

-- 
Live well,
~wren



More information about the Libraries mailing list