ANN: unicode-properties, unicode-names

Ashley Yakeley ashley at
Tue Sep 2 00:54:38 EDT 2008

unicode-properties, unicode-names

These two packages are representations in Haskell of various data in the 
Unicode 3.2.0 Character Database. Unicode 3.2.0 was the latest version 
of the Unicode standard at the time I wrote most of the code; later I 
may move the packages to the latest version (currently 5.1.0).

The unicode-properties package contains functions to determine general 
category, case, and a wide range of other properties, as well as to do 
decomposition and case-folding.

The unicode-names package contains just one function, getCharacterName, 
for getting the name of a character. It's separated out because it's a 
sufficiently large proportion of the total data.

Both packages use the type "Char" to represent Unicode characters (more 
pedantically, codepoints). In GHC Char has the range 
['\x0'..'\x10FFFF'], matching the Unicode standard. The packages won't 
work with compilers that restrict Char to a smaller range.


Source for both packages: <>
Most of the data is auto-generated at build time from files downloadable 
from the Unicode web-site.

I expect Don will have them both in Arch Linux within the hour.

Ashley Yakeley

More information about the Libraries mailing list